This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
“Pre-prints” are sometimes neither Pre nor Print (c.f. https://doi.org/10.12688/f1000research.11408.1, but they do go on and get published in journals. While researchers may have different motivations for posting a preprint, such as establishing a record of priority or seeking rapid feedback, the primary motivation appears to be timely sharing of results prior to journal publication.
So where in fact do preprints get published?
Although this is a simple question, we have not had an easy way to answer how this varies across disciplines, preprint repositories and journals. Until now. Crossref metadata provides not only an open and easy way to do so, but up-to-date data to get the latest results.
rOpenSci makin’ it sweet & easy
Crossref asks preprint repositories to update their metadata once a preprint has been published by adding the article link into its record via the “is-preprint-of” relation. As the record is processed, we make the link available going both directions, while preserving the provenance of the statement in the metadata output (“asserted-by”: “subject” or “asserted-by”: “object”). This results in bidirectional assertions in the Crossref REST API where search engines, analytics providers, indexes, etc. can get from the preprint to the article (“is-preprint-of”) as well as vice versa (“has-preprint”), making it easier to find, cite, link, assess, and reuse.
So without further delay, let’s look at the results of the 20 journals with the highest number of preprints associated with its articles (data from August 21, 2018):
Publisher
Journal
Count
PeerJ
PeerJ
1184
Springer Nature
Scientific Reports
394
eLife
eLife
375
PLOS
PLOS ONE
338
Proceedings of the National Academy of Sciences
PNAS
205
PLOS
PLOS Computational Biology
196
Springer Nature
Nature Communications
187
PLOS
PLOS Genetics
169
The Genetics Society of America
Genetics
168
Oxford University Press
Nucleic Acids Research
148
Oxford University Press
Bioinformatics
138
The Genetics Society of America
Genetics
120
The Genetics Society of America
G3: Genes, Genomes, Genetics
104
Cold Spring Harbor Laboratory
Genome Research
104
Oxford University Press
Molecular Biology and Evolution
100
MDPI AG
Energies
98
MDPI AG
Sensors
96
Springer Nature
BMC Genomics
92
MDPI AG
International Journal of Molecular Sciences
86
JMIR Publications
Journal of Medical Internet Research
83
This list has not been normalized or weighted based on the size of the journal. The following observations are informed speculations, as we can only infer so much from the raw data:
Disciplinary practice: This phenomenon where preprints are a part of disciplinary practice accounts for about half of the journals represented on the list. Certain communities such as genetics and computational fields have been early adopters of preprints. As such, we see higher rates of preprint-to-article publication in journals that publish their work.
Partnerships: Partnerships that facilitate submission from the preprint repository directly to a publisher or peer review service (ex: BioRxiv B2J program) make it easier for researchers to move from preprint-sharing seamlessly to submitting their journal article manuscript.
Tie-ins: A quarter of the journals on the list are run by publishers with a preprint service, and have been able to tie together both arms of publishing. This removes barriers to journal article submission in the same manner as integrations between repositories and publishers, but does so as a single party.
Publisher support and treatment: We also see that strong proponents and early partners of preprint repositories tend to have higher counts. Some publishers have been more outspoken in their welcome of preprints, such as PNAS. Sometimes this support also comes in the form of special treatment. In the process of crafting editorial policy on publishing results previously posted in a preprint, some journals have carved out particular affordances in their publication workflow and content delivery streams that may contribute to the higher counts of articles. For example, Nature Research displays the preprints of submitted articles under consideration: https://nature-research-under-consideration.nature.com/.
Mega-journals: Mega-journals such as Scientific Reports and PLOS ONE have not discouraged preprints. As such, and due to the size of their publication output, they have easily found a place among the higher counts on the list.
Taking a closer look
One major consideration in these results, concerns what’s missing in the data. These fall into two camps: incomplete member data, and incomplete membership coverage.
We have been working with our members to deposit preprints using the proper record type, and to provide links to published articles in their metadata. However, not all have yet done so (ex: SSRN), leading to holes in our research nexus graph, which subsequently detracts from the completeness of the data.
We celebrate the preprint repositories who are required to update their metadata when an article is published from a preprint, thereby populating the map with critical bridges between preprints and articles. Crossref participation benefits not only the content owner, but the membership at large and all the systems across the research ecosystem powered by Crossref metadata.
Lastly, this data is dependent on the coverage of preprint repositories who register content with us. We are thrilled that Center for Open Science, our newest preprints addition who represents 21 community repositories, has recently filled in swaths of the map. But there remain dead zones in the research graph from repositories who are not Crossref members (ex: ArXiv). Their disciplines, as a result, are under represented in these results.
Everyone dive in!
As to the question of “where do preprints get published?”, anyone in fact can answer this question based on the metadata Crossref collects and provides to the community as an open infrastructure provider. We encourage the community to explore and analyze the data further with other available datasets to glean more insights on how scholarly communications is changing with the increasing growth of preprints. For example, the effective results across all journals represented can be weighted based on the number of articles published by each journal.
Crossref data is open for all to examine and reuse through our REST API. Please dive in and share your findings with us!