This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
We talk so much about more and better metadata that a reasonable question might be: what is Crossref doing to help?
Members and their service partners do the heavy lifting to provide Crossref with metadata and we don’t change what is supplied to us. One reason we don’t is because members can and often do change their records (important note: updated records do not incur fees!). However, we do a fair amount of behind the scenes work to check and report on the metadata as well as to add context and relationships. As a result, some of what you see in the metadata (and some of what you don’t) is facilitated, added or updated by Crossref.
Much of the work is automated but some of it still requires manual intervention (sound familiar?). Here’s an overview:
Before registration
Our open APIs allow for Crossref metadata to be used throughout research and scholarly communications systems and services, before and after records are registered with us. Those who have used a search function in something like a manuscript submission system, rather than having to hand key or copy and paste the information, will appreciate how these integrations reduce time, effort and the likelihood of errors in collecting metadata well before it gets to Crossref.
For one example, it’s very common for members to use the metadata to add DOIs to reference lists when preparing deposits. Of course, new members first need a prefix (and a memberID and name, but more on that later) in order to register content. We also provide a suffix generator for help in constructing DOIs. If you’re not sure how best to make use of existing metadata in deposits, we’ve got a few options for you. Questions are welcome.
We don’t often put it this way but we should: Crossref members rely on the metadata as much, if not more, than the rest of the community. More and better metadata directly benefits our members.
Upon registration
There are a number of ways we work with the metadata when deposits are received.
Checking for uniqueness In order to avoid duplicate records, we check to make sure that a title or work hasn’t been registered before. Depending on what we find, a conflict report or failed registration may result.
Adding DOIs to references When references come to us without DOIs, we’ll try to match and add them.
ORCID auto-update We automatically update authors’ ORCID records (with their permission of course) whenever deposits include their ORCID iDs.
Preprint to VoR reports We compare title information and provide notifications of matching records to members depositing preprints, to help them fulfill their obligation to link to Versions of Record (VoRs), where they exist.
Relationships Like preprint to VoR links, components are another kind of relationship. These might be supplementary material such as figures we can link to the ‘parent’ record.
Funding data When members register only a funder name as part of the information on who funded the work, we’ll try to match it to its identifier from the Funder Registry, to support better linking between funders and works.
Timestamps We add date-times for first created and last updated to member-supplied timestamps.
Count of references That’s right, we count all the references for each record that includes them and add the total to the metadata.
After registration
Once registered, we check, report on and update metadata in a few ways.
Link checking We email each member a monthly Resolution Report with details of the number of failed and successful resolutions for their DOIs. If someone in the community reports a DOI that isn’t registered, we email the member a DOI Error Report.
Citation counts and matches Citation counts for records of members participating in our Cited-by service are openly available in our REST API. The matching citations themselves are available to members, for their own records only.
Title transfers Title, prefix and DOI transfers are common and require assistance from our team.
MemberID It’s not uncommon for members to have more than one prefix. The memberID means users of the REST API can query for records associated with all of a member’s prefixes.
Digital preservation We handle the infrequent but critical update of URLs that are necessary when titles are triggered for digital preservation. We also preserve the metadata itself, with both CLOCKSS and Portico.
Of course, since records are often redeposited with updates (note, deposit fees are only charged once per record), some of these processes on our side are repeated as necessary.
This list isn’t exhaustive and other needs and opportunities will emerge. For example, we are looking at matching to add ROR IDs, as we do for funderIDs, and doing some research into how we might determine and assert subject classifications at the work-level. If you’re interested in more about this kind of work, you’ll want to read this recent post by my Labs colleague Dominika on matching grants to outputs.
Get in touch if you have questions or for more information.