This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
I wanted to make some remarks about the “Ease of use” and “Learn curve” ratings which I gave in the ORE/POWDER comparison table that I blogged about here the other day. It may seem that I came out a little harsh on ORE and a little easy on POWDER. I just wanted to rationalize the justification for calling it that way. (By the way, the revised comparison table includes a qualification to those ratings.)
My primary interest was from the perspective of a data provider rather than a data consumer. What does it take to get a resource description document (“resource map”, “description resource” or “sitemap”) ready for publication?
(Continues)
To look at POWDER first, it defines two sets of semantics: an “operational semantics” which is embodied in the simple XML that is intended as the primary publication vehicle, and a “formal semantics” embodied in the RDF/OWL document that would typically be generated by a POWDER processor.
The operational semantics (XML) document requires minimal RDF understanding (and arguably none at all): it only requires that URI resources be organized into groups by pattern matching, and that metadata be attached to those groups using groups.
URI patterns are specified using any of the following XML elements for inclusive patterns:
These are turned into corresponding regular expressions by a POWDER processor which then emits RDF/OWL classes using those expressions as property restrictions on set membership. But a publisher is not required to understand this transformation nor the formal semantics generated from the simple XML document that was authored.
Now, as to metadata. Resource group descriptors are either free text (tags) or properties from a published namespace. For example, the property name from a namespace ex: would be added in one of two ways, depending on whether it were a simple literal string (“value”, say) or a resource URI:
While technically this is RDF/XML it hardly qualifies, I think, as requiring any great knowledge of RDF, more a knowledge of XML namespaces alone would be sufficient.
And that’s about it – all that is required for publication of a POWDER “description resource” document. (The guidelines for discovery mechanisms of a POWDER document might also need to be consulted.)
So, on that basis I would judge POWDER to be at most “medium” on the “Learn curve”. However, as soon as the mapping to the formal semantics (POWDER-S) using RDF/OWL is considered, then that learn curve rating would automatically swing to “high”.
Now, ORE on the other hand is a straightforward RDF application. What does make ORE a bit of a challenge are the following two aspects:
1. concept of named aggregation
* abstract data model - no fixed bindings</ol>
Well, the first aspect is what ORE is all about – its USP – and what it gives us beyond the simpler POWDER approach of merely describing resource bundles. Still, it’s a concept that needs to be grokked. All too easy to take it for granted.
It is the second aspect that may make ORE appear to be “difficult”. It does not prescribe a single binding or set of bindings but provides an abstract data model. That means that a prospective user must endeavour to understand something of the model before deploying.
But enough of that. Because who really reads instruction manuals anyway? So to deploy there are user guides available for one standalone document format (RDF/XML), and two carrier document formats (Atom, RDFa). That means right there that the publisher must either embrace RDF/XML or learn how to weave it into an existing document markup. (By the way, it should be remarked that there is an excellent [primer][3] available - as there is also for POWDER - and user guides for each of the formats.)
So that I think warrants the “high” rating for ORE on the learn curve, and the corresponding “low” ease of use. But that is not to say that the two initiatives are in any competition and that one should be favoured over the other. They serve different purposes. Any yet they may also have compatibilities as the previous [mapping of ORE in POWDER][4] attempts to show. I’ll leave that task for other commentators.