This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I’ve now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)
So, as a preliminary to reviewing how a fuller metadata description of a Crossref resource may best be fitted into an XMP packet for embedding into a PDF, let’s just consider how a DOI can be embedded into XMP. And since it’s so much clearer to read let’s just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)
(And further to Chris Shillum’s comment [(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I’ve now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)
So, as a preliminary to reviewing how a fuller metadata description of a Crossref resource may best be fitted into an XMP packet for embedding into a PDF, let’s just consider how a DOI can be embedded into XMP. And since it’s so much clearer to read let’s just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)
(And further to Chris Shillum’s comment]2 on my earlier post Metadata in PDF: 2. Use Cases where he notes that Elsevier are looking to upgrade their markup of DOI in PDF to use XMP, I’m really hoping that Elsevier may have something to bring to the party and share with us. A consensus rendering of DOI within XMP is going to be of benefit to all.)
(Continues.)
Within an XMP packet our first idea might be to include the DOI using the Dublin Core (DC) schema element dc:identifier in minimalist fashion:
This simply says that the current document (denoted by the empty URI “<>“) has a string property “10.1038/nrg2158” which is of type identifier from the dc (or Dublin Core) schema which is identified by the URI http://purl.org/dc/elements/1.1/.
Now, since this is just a DOI and the wider public cannot be expected to know about DOIs, it would surely be better to present the DOI in URI form (doi:) as
Aside: This shows up a limitation of XMP where the DC schema property value for dc:identifier is fixed as type Text. The natural way to express the above in RDF/N3 would be as:
which says that the value is a URI (type URI in XMP terms), not a string (type Text in XMP terms). We either have to flout the XMP specification or else live with this restriction. We’ll opt for the latter for now.
But, the XMP Spec deprecates the use of dc:identifier since the context is not specific. (Note that that’s what was just discussed above. The limitation is built into XMP which builds on RDF but does not fully endorse the RDF world view.) Instead the XMP Spec recommends using xmp:Identifier since the context can be set using a qualified property as:
This says the string “doi:10.1038/nrg2158”belongs to the scheme “URI”.
But this is the unregistered URI form (doi:), so should we be using instead the registered form (info:)? Well, turns out that this construct for xmp:Identifier is an rdf:Bag so we can include more than one term. How about using this construct then:
OK, that takes care of the XMP direction to use xmp:Identifier, but, while deprecated by XMP, we note that back in the real world folks will be looking at the DC elements which is the schema with the greatest purchase. So, why not also add in a dc:identifier element such as would be used typically for DOI in citations. How about this:
Right, so we’ve taken care of the identfiers. But maybe there’s something missing? There’s no link to the DOI proxy. For widest applicability we should not assume prior knowledge of the DOI system. Perhaps we could include this link using the property dc:relation? Seems feasible though would really like to get some feedback on this. Any ideas?
So here, then, is a fairly full and complete expression of DOI within the XMP packet.
(Of course, this is all premised on having freedom in writing out the XMP packet. If one is dependent on commercial applications to write out the packet then things may be different. Actually, they will be very different. They may not even be workable.)