This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Tony’s recent thread on making DOIs play nicely in a linked data world has raised an issue I’ve meant to discuss here for some time- a lot of the thread is predicated on the idea that Crossref DOIs are applied at the abstract “work” level. Indeed, that it what it currently says in our guidelines. Unfortunately, this is a case where theory, practice and documentation all diverge.
When the Crossref linking system was developed it was focused primarily on facilitating persistent linking amongst journals and conference proceedings. The system was quickly adapted to handle books and more recently to handle working papers, technical reports, standards and “components”- a catchall term used to refer to everything from individual article images to database records.
In practice the content outside of the core journals and conference proceedings has accounted for relatively low volume. However, we expect that over the next few years this will change and that books and databases will increasingly drive the future growth in Crossref’s citation linking services. Interestingly, these record types all share characteristics that make them substantially different from the journals and conference proceedings that we have hitherto focused on.
Both books and databases introduce new challenges to technology and policies of our citation linking service. The challenges revolved around two areas:
Structure: Both books and databases can have complex structures and the publishers of this content are likely to require granular identification of these content substructures along with a mechanism for documenting the relationship between these substructures (e.g. this section is part of this chapter which is part of this monograph which is part of this series)
Versioning: Unlike typical journals and conference proceedings, books and database records sometimes change over time.
When confronted with the issues of structure and versioning publishers are often tempted to take shortcuts and decide to simply assign DOIs at the highest level structure and to the “work” instead of a particular “manifestation” or version of that work. Indeed, section 5.5 of Crossref’s [DOI Name Information and Guidelines][2] recommends this. But this approach could have a negative impact on the integrity of the scholarly citation record that Crossref is attempting to maintain.
Fundamentally, Crossref DOIs are aimed at providing a persistent online citation infrastructure for scholarly and professional publishers. Consequently, decisions about where to apply Crossref DOIs should be guided by common expectations about the way in which citations work. Citations are typically used to credit ideas or provide evidence. A reader follows a citation in order to obtain more detail or to verify that an author is accurately representing the item cited. A rule of thumb is that a reader has a reasonable expectation that when they follow a citation, they will be taken to what the author saw when creating the citation. Any divergent behavior could result in the reader concluding that the author was misrepresenting the item cited. A further implication of this is that any changes to content that are likely to effect the crediting or interpretation of the content should result in that changed content getting a new Crossref DOI.
Typically, this means that Crossref DOIs should be probably assigned at the expression level and different expressions should be assigned different Crossref DOIs. This is because assigning a Crossref DOI at the higher “work” level is generally not granular enough to guarantee that a reader following the citation will see what the author saw when creating the citation. For example, one translation of a work might be substantially different from another translation of the same work. Similarly a draft version of a work might be substantially different from the final published version of the work. In each case, resolving a citation to a different expression of the work than the expression that was originally cited might result in the reader interpreting the content differently than the citing author.
In general, different “equivalent manifestations” of the same work can safely be assigned the same Crossref DOI. So, for instance, the HTML formatted version an article and the PDF formatted version of an article can almost always be assigned the same Crossref DOI. Any differences between the two are unlikely to affect the crediting of, or reader’s interpretation of, the work. But sometimes it is even possible that different manifestations of an expression will differ enough to merit different Crossref DOIs. For instance, a semantically enhanced version of an article might require new crediting (e.g. the parties responsible for adding the semantic information) and the resulting semantic enhancement may conceivably alter the reader’s interpretation of the article.
Unfortunately, there is no hard and fast rule about where and when to assign new Crossref DOIs. Instead there is only a guideline, namely:
“Assign new Crossref DOIs to content in a way that will ensure that a reader following the citation will see something as close to what the original author cited as is possible.”
The implications of this to publishers are important, especially when they are assigning DOIs to protean records types. For instance, it may mean that:
Book publishers should be expected to keep old editions of books available for link resolution purposes.
Publishers of content that can change rapidly (e.g. by the second) should provide facilities for creating frozen, archived snapshots of content for citation purposes.
All publishers of protean content should issue guidelines instructing researchers on when it is appropriate to cite a work, manifestation or version.
Crossref needs to actively consider these issues as publishers start assigning Crossref DOIs to more dynamic types of content. Minimally, we should be able to provide publishers with recommendations on how to make dynamic content citable. We may even want to consider enshrining certain types of behavior in our terms and conditions so as to ensure the future integrity of the scholarly citation record.