This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
We are often asked who uses Crossref metadata and for what. One common use case is researchers in bibliometrics and scientometrics (among other fields) doing meta analyses on the entire corpus of records. As we pass the 10 year mark for the Funder Registry and 5 years of funders joining Crossref as members to register their grants, it’s worth a look at some recent research that focuses specifically on funding information. After all, there is funding behind so much scholarly work it seems obvious that it would be routinely documented in the scholarly record. But it often isn’t and that’s a problem. These sources make clear the need for accurate funding information and the problems that the lack of it creates.
First, a few notes for context on these sources and the issues they discuss :
The percent of records with funding information reached about 25% as of 2021. Not all items registered are the result of funding but surely it is much higher than 25% so there is considerable room for improvement. The authors cite publishers that omit funding information as well as those that include it routinely. Overall, society publishers are at the top of the list of those that do it well.
Three of the four sources found problems in some cases confirming funding information from the metadata in the original sources. This initially surprised me though less so once I thought about the strange nature of metadata workflows.
The complexity of fully and correctly acknowledging multiple sources of funding in any given publication is a recurring theme.
All of the sources mention the need for manual work in analyzing funding and publication information.
The first two papers are from the same 2022 issue of Quantitative Science Studies and are complementary.
Alexis-Michel Mugabushaka, Nees Jan van Eck, Ludo Waltman; Funding COVID-19 research: Insights from an exploratory analysis using open data infrastructures.Quantitative Science Studies 2022; 3 (3): 560–582. doi: https://doi.org/10.1162/qss_a_00212
This first paper tackles the timely question of determining which funders have supported publications of COVID-19 research and compares coverage of funding data in Crossref to that in Scopus and Web of Science. Even with so much urgent attention focused on the pandemic, the authors found that only 17% of publications in the COVID-focused CORD-19 database have funding identified in their Crossref records.
We’re often asked about differences in the metadata (and citation counts) between Crossref and other sources such as Scopus. In this case, both proprietary sources studied have more funder coverage.
If you are disappointed in these results or want to learn more, I encourage you to read the authors’ recommendations for improving funding data in Crossref or get in touch with us.
Bianca Kramer, Hans de Jonge; The availability and completeness of open funder metadata: Case study for publications funded by the Dutch Research Council.Quantitative Science Studies 2022; 3 (3): 583–599. doi: https://doi.org/10.1162/qss_a_00210
This next paper focuses on a set of outputs funded by the NWO (the Dutch Research Council). Since the funder is already known, the authors could look at multiple sources (Crossref and others) to see whether or where the NWO is correctly identified as the funder. This study also found better coverage than Crossref in proprietary sources like Web of Science. Knowing that not all outputs are the result of funded research, this paper provides a new and useful baseline for comparing percentages of coverage.
Discussions of research funding so often focus on the physical and life sciences so it’s very good to see that 37% of works in this study are in the humanities and social sciences.
Borst, T., Mielck, J., Nannt, M., Riese, W. (2022). Extracting Funder Information from Scientific Papers - Experiences with Question Answering. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham. https://doi.org/10.1007/978-3-031-16802-4_24
Given the considerable effort required to conduct these analyses, it’s only logical to consider automating as much of the work as possible. This next paper focuses on automatic recognition of funders in economics papers in digital libraries.
An interesting complication described here is the inclusion of funding for open access fees in acknowledgments and while the authors conclude that automated text mining of funder information performs better than manual curation, they also state that manual indexing is still necessary “for a gold standard of reliable metadata.”
Finally, this concise blog post looks at RORs as well as funder names and acronyms. The author shows how acronyms contribute to the need for manual analysis. He also spends some time on award numbers, which is one of the three funding elements publishers can (and, as we’ve seen, should) include in their metadata. Award numbers are also a focus of this work and, unfortunately, another frequent reason for additional manual work.
A common theme: More metadata needed
Though collectively, this research paints a fairly dim picture of the current availability, completeness and accuracy of existing funding information in publication metadata, all is not lost. This is a good opportunity to point out the value and availability of grant records since unique, persistent identifiers for grants (yes, DOIs for grants) paired with more and better funding metadata from publishers go a very long way to realizing the vision of the Research Nexus. And it certainly would make things a whole lot easier for the researchers who use this open metadata to analyze the scholarly record for the rest of us.