This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
At Crossref we mint DOIs for publications and send them out into the world, but we like to hear how they’re getting on out there. Obviously, DOIs are used heavily within the formal scholarly literature and for citations, but they’re increasingly being used outside of formal publications in places we didn’t expect. With our DOI Event Tracking / ALM pilot project we’re collecting information about how DOIs are mentioned on the open web to try and build a picture about new methods of citation.
As part of the preparation for collaborating with Wikipedia, we looked at our statistics about when DOIs are clicked and discovered that Wikipedia was, over a two year period from 2012, the eighth largest referrer of DOIs. This means that not only does Wikipedia have a lot of DOIs, but people click them too. This bit of one-off data analysis (which surprised us) gave us enough of a prod to kickstart our collaboration with Wikipedia.
At the ALM Workshop 2014 in San Francisco we talked to some Wikipedians and bibliometricians and realised that we were sitting on a really interesting data-set and that it would be churlish not to share it. At the hackathon (read the report here) we started work on a service to gather information about DOIs and, a month later, we’re ready to unveil the DOI Chronograph.
And, the chart that kicked this all off: DOI referring domains league tables. This shows that Wikipedia is the 3rd or 4th non-traditional referrer of DOIs (i.e. excluding referrals from Publishers’ domains): http://chronograph.labs.crossref.org/top.html
Talking to a bibliometrician we also realised we can correlate other data for DOIs. We’re getting the issue date (approximately the publication date) from our own metadata, as well as the date that the Crossref metadata was updated. This gives interesting results, like the resolutions for 10.1038/ncomms2953, which peak after publication and then tails off. We are attempting to collect the following information:
daily resolution counts
day on which resolution was first successful
day on which it’s possible to resolve the DOI (we’ve got a bot running for new publications)
day on which the publisher says the article was published
day on which the metadata was most recently deposited with us
day on which the metadata was first deposited with us
We’re not there yet, but we’ve made a start and we’ve already got some pretty interesting data!
Weasel words
It’s a labs project so the usual weasel words apply. Specifically, we currently have the logs for 2012 to 2014 (we’re working at digging out the rest), and the referral information for 50 million DOIs (out of 71 million). That number will be higher by the time you read this. If your page is slow to load, be patient, as it’s currently working hard crunching numbers.
This project is focused on exploring the use of DOIs outside of the formal literature. As such, we are only looking at referrals from domains that do not appear to belong to primary publishers (i.e. our members). If you try a domain and it doesn’t work, it could be that the domain belongs to one of our members. If you’ve notice any mistakes, please email us at labs@crossref.org .
Finally, these numbers contain all DOI resolutions. That’s human clicks but also content negotiation to retrieve metadata, robots etc. We might try to filter them in future, but for now be aware that not every visitor is a human.
I’ll detail some of the the technical stuff (it’s very interesting) and what happened next with Wikipedia in a future post. Watch this space.