This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
We’ve been talking a lot about infrastructure here at Crossref, and how the metadata we gather and organize is the foundation for so many services - those we provide directly - and those services that use our APIs to access that metadata, such as Kudos and CHORUS, which in turn provide the wider world of researchers, administrators, and funders with tailored information and tools.
The initiative formerly known as FundRef
Together Crossref’s funding data (previously known as FundRef – we simplified the name) and the Open Funder Registry, our taxonomy of grant-giving organizations, comprise a hub for gathering and querying metadata related to the questions:
“Who funded this research?” and “Where has the research we funded been published?”
To support the funding data initiative, three key pieces of metadata are needed from publishers:
Funder ID
Funder Name
DOI
Unfortunately only around half of the 950,000 Crossref DOIs with funding data contain funder IDs, the unique funder identifiers from the Open Funder Registry that are needed to link up all of the data. So, only half of the data is useful. (And 950,000 DOIs is only a fraction of the 77 million DOIs in our database, but more on that later).
When we looked at the funding data that was coming in without funder IDs we were a little surprised. We had expected that most of these would be names that simply aren’t in the Open Funder Registry yet, and we thought there would be a certain amount of incorrect information that had been entered into the “funder_name” field.
Instead, what we found was that many of the names were correct, and the funder IDs were just missing.
Tidying the data
To help correct this, we decided to match incoming names to funder IDs where we could do so with the highest level of confidence. After much testing to minimize false positives, we switched this on at the end of August 2015.
Throughout September and October, we inserted funder IDs for about 25% of the names that have been deposited without IDs. For October, the real numbers were 68,000 funder names with no IDs deposited, and 18,000 funder IDs inserted by Crossref.
In the same period 42,000 funder IDs were deposited by publishers. With our matching on top of this, we are achieving a little over a 50% overall success rate of “good” funding data (funder names and funder IDs together).
We have been very careful to distinguish the funder IDs that we have added from those deposited by publishers - provenance of data is an extremely important part of what we do. All funder IDs are tagged as provided either by the publisher or Crossref. Every time we insert an ID into a deposit, the publisher is notified in the deposit report.
We have also now added these tags to our REST API so that publishers can query to find out exactly which DOIs we have amended*. The ideal scenario at this point is that the publisher checks that they are happy with the matching and then redeposits the funding data for those DOIs, over-writing the <span >doi-asserted-by: “crossref”</span> tag and claiming the metadata as their own.
Setting some limits
The second largest problem with funding data was incorrectly entered funder name – e.g. concatenation of several names or authors entering overly long or vague program names instead of the official funder name.
To help weed this out, we have made a couple of changes to the funding data deposit system:
Funder_name field can no longer contain a numerical string over 4 digits
Funder_name field can no longer contain a text string over 200 characters
Funder names that that do not adhere to these two rules will now cause the funding data section of the metadata deposit (not the whole deposit) to fail and return an error message.
Getting the growth we need
As of today, 198 publishers deposit funding data with Crossref. This amounts to about 3.5% of Crossref’s membership(although it’s a larger proportion of our total deposits). We need more publishers to deposit funding data so that funding data search can become a truly useful tool for the community. There’s no sign-up process or additional fee - read about how to get started, and take a look at our best practices for depositing funding data.
Finally, we ask you: how can we get more and better funder metadata in 2016?
This is not a rhetorical question. Please tweet your thoughts @CrossrefOrg or email your replies to info@crossref.org. You will receive something special via snail mail if you reply to us – just Crossref’s way of saying thank you.
*At the time of posting our database is re-indexing and the “asserted-by” tags are still filtering through to the API. Check back in a day or two for the full picture.