This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
This D-Lib paper by Altman and King looks interesting: “A Proposed Standard for the Scholarly Citation of Quantitative Data”. (And thanks to Herbert Van de Sompel for drawing attention to the paper.) Gist of it (Sect. 3) is
_“We propose that citations to numerical data include, at a minimum, six required components. The first three components are traditional, directly paralleling print documents. … Thus, we add three components using modern technology, each of which is designed to persist even when the technology changes: a unique global identifier, a universal numeric fingerprint, and a bridge service. They are also designed to take advantage of the digital form of quantitative data.
An example of a complete citation, using this minimal version of the proposed standards, is as follows:
**Micah Altman; Karin MacDonald; Michael P. McDonald, 2005, “Computer Use in Redistricting”,
So the abbreviated citation (author, date, title, unique ID) is supplemented by a UNF which fingerprints the data. UNFs would appear to be a sort of super MD5 in providing a signature of the data content independent of the data serialization to a filestore.
_“Thus, we add as the fifth component a Universal Numeric Fingerprint or UNF. The UNF is a short, fixed-length string of numbers and characters that summarize all the content in the data set, such that a change in any part of the data would produce a completely different UNF. A UNF works by first translating the data into a canonical form with fixed degrees of numerical precision and then applies a cryptographic hash function to produce the short string. The advantage of canonicalization is that UNFs (but not raw hash functions) are format-independent: they keep the same value even if the data set is moved between software programs, file storage systems, compression schemes, operating systems, or hardware platforms.
…
Finally, since most web browsers do not currently recognize global unique identifiers directly (i.e., without typing them into a web form), we add as the sixth and final component of the citation standard a bridge service, which is designed to make this task easier in the medium term.”_
Certainly looks promising. I’m not sure if there’s any other contestants in this arena.