This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
So, back on the old XMP tack. The simple vision from the XMP spec is that XMP packets are embedded in media files and transported along with them - and as such are relatively self-contained units, see Fig 1.
Fig. 1 - Media files with fully encapsulated descriptions.
But this is too simple. Some preliminary considerations lead us to to see why we might want to reference additional (i.e. external) sources of metadata from the original packet:
PDFs
PDFs are tightly structured and as such it can be difficult to write a new packet, or to update an existing packet. One solution proposed earlier is to embed a minimal packet which could then reference a more complete description in a standalone packet. (And in turn this standalone packet could reference additional sources of metadata.)
Images
While considerably simpler to write into web-delivery image formats (e.g. JPEG, GIF, PNG), it is the case that metadata pertinent to the image only is likely to be embedded. Also, of interest is the work from which the image is derived which is most likely to be presented externally to the image as a standalone document. (And in turn this standalone packet could reference additional sources of metadata.)
(Continues)
Thus the two cases - PDF documents and images - are not dissimilar. Fig. 2 shows a “wall-to-wall” XMP architecture whereby the standalone metadata documents for the work and for additional sources are expressed in XMP.
Fig. 2 - XMP “wall-to-wall” architecture.
Fig. 3 presents a variant on this theme whereby additional sources are presented as generic RDF/XML. (In the most general case only RDF need be assumed, the serialization being a matter of choice.)
Fig. 3 - XMP authority metadata with references to generic RDF/XML
And finally, Fig. 4 shows the most extreme case whereby XMP is used merely to “bootstrap” RDF descriptions for media objects. The XMP is used to embed a minimal description into the media file with references to a fuller work description and to additional sources which are presented as generic RDF/XML. That is, the metadata descriptions use generic RDF/XML exclusively and only resort to the idiomatic RDF/XML employed by XMP for embedding descriptions into binary structures.
Fig. 4 - XMP “bootstrap” only - metadata descriptions proper are generic RDF/XML.
If I were to choose I might opt for the scenario presented in Fig. 3, but the scenarios in both Figs. 2 and 4 leave room for thought. Such a hybrid solution may be a means to bridge two different concerns:
Generic RDF/XML for unconstrained descriptions.
Idiomatic RDF/XML (aka XMP) for embedding the head of a metadata trail.
I’m not sure that I see the XMP spec loosening up any time soon to accommodate generic RDF/XML. Nor, likewise is XMP likely to be provided (or even tolerated) down the metadata trail. And the metadata is not going to be fully encapsulated within a media file. The media file will merely encapsulate the head of the metadata trail.