This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
The Crossref graph of the research enterprise is growing at an impressive rate of 2.5 million records a month - scholarly communications of all stripes and sizes. Preprints are one of the fastest growing types of content. While preprints may not be new, the growth may well be: ~30% for the past 2 years (compared to article growth of 2-3% for the same period). We began supporting preprints in November 2016 at the behest of our members. When members register them, we ensure that: links to these publications persist over time; they are connected to the full history of the shared research results; and the citation record is clear and up-to-date.
As of May 24, 2018 we have 44,388 works (see API query with a json viewer) registered as posted content. Today that number is over 150k. Preprints are part of this record type category, which is meant to house scholarly outputs that have been posted online and intended for publication in the future.
For a more granular view, see the monthly stats captured by Jordan Anaya in PrePubMed. This data is based on a slightly different set of preprint repositories, though both show the same trends.
The figure below shows the preprints registered with Crossref, broken down by repository.
We eagerly await our newest preprints member, Center for Open Science, who will soon be registering the preprints from their 18 community archives with us (~9k preprints total to date).
Metadata coverage
We accept a range of metadata for the preprints registered with us, including:
Repository name & hosting platform
Contributor names & ORCID iDs
Dates (posted, accepted)
As with all resource/record types, certain metadata is required, though others are optional. We encourage full coverage of metadata in the record where applicable and possible. So what are publishers including in their posted content records? The summary view is as follows:
Compared to all the published content registered with us over time, preprints have above average coverage of ORCID iDs deposited and show well above average with abstract metadata. However, they are significantly lagging behind with depositing references, license, and funding metadata. (See a summary of the full corpus stats taken two months ago in the blog post, A Lustrum over the Weekend.
Preprint-article pairs
Members registering preprints have an obligation to update the metadata record when a journal article is subsequently published, to clearly identify this work. This pairing is passed on to our metadata users: indexing platforms; recommendations engines; platforms; tools, etc. which pull from our APIs. (The preprint landing page also must link to the article.) As such, the preprint-article pairings are amassing as each week passes. We currently have a total of 12983 (json) preprints connected to articles. The figure below provides the counts based on repository.
We can see from preprint Cited-by counts that researchers are indeed citing preprints in their articles. This practice is an extension of the common citation behavior to provide evidence for and credit to previous work, a natural consequence of work shared with their peers. The most highly cited preprint papers (json) as of May 24, 2018 are as follows. In some cases, a subsequent paper was published from the results shared in the preprint. These have also accrued citations in their own right and these are also indicated in the table below.
Spread of the pandemic Zika virus lineage is associated with NS1 codon usage adaptation in humans
November 25, 2015
The relationship between preprints and the proceeding publication is an interesting area that is not yet well understood. We invite the community to analyze the Crossref metadata using the REST API in concert with other datasets. For example, the citation lifecycle for these two research products has been one of speculation so far without a systematic investigation into patterns and timeframes of preprint citations and those of its succeeding article across the corpus. Here, submission dates would be critical data to this research question as publication windows vary significantly by publisher and by paper.