This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
This month we have officially released a new version of our input metadata schema. As well as walking through the latest additions, I’ll also describe here how we’re starting to develop a new streamlined and open approach to schema development, using GitLab and some of the ideas under discussion going forward.
What’s included in version 4.4.2
The latest schema as of August 2019 is version 4.4.2 and this release now includes:
Abstract support to dissertations, reports, and allow multiple abstracts wherever available
Support for multiple dissertation authors
A new acceptance_date element added to journal article, book, book chapter, and conference paper record types
“Pending publication” is the term we’ve coined for the phase where a manuscript has been accepted for publication but where the publisher needs to communicate a DOI much earlier than most article metadata is available. Some members asked for the ability to register and assign DOIs prior to online publication, even without a title, so this allows members to register a DOI with minimal metadata, temporarily, before online publication. There is of course no obligation to use this feature.
It’s worth calling out the addition of acceptance_date too. This is a key attribute that is heavily requested by downstream metadata users like universities. Acceptance dates allow people to report on outputs much more accurately, so we do encourage all members to start including acceptance dates in their metadata. It’s highly appreciated!
Schema files public on GitLab
I’ve added our latest schema to a new GitLab repository, There you’ll find the schema files, some documentation, and the opportunity to suggest enhancements. The schema has been released as bundle 0.1.1 and also includes our new Grant metadata schema for members that fund research.
The schema has been available in some form for months but at this point we consider it ‘officially’ released to kick off our new but necessary practice of formal schema releases. Any forthcoming updates will be added to the next version.
Schema management process
We’ve been adding sets of metadata and new record types over the years, but also need to have a defined process for small but vital pieces of metadata that you need to provide and retrieve from our metadata records. If you’re wondering what our procedure for updating our schema is, you are not alone! We have not had a formal process, instead relying on ad-hoc requests from our membership and working groups. Our release management and schema numbering has also not been consistent.
Going forward, I will ensure that all forthcoming versions of our metadata schema are be posted as a draft on GitLab for review and comment, and the final version will be officially released via GitLab as well.
It’s important to note that when we talk about “the schema”, we generally mean the input schema specifically i.e. what members of Crossref can register about the content they produce. As always, the output for retrieving that metadata is subject to separate development plans for our Metadata APIs. I’m working with our technical team so we can develop and introduce an ’end-to-end’ approach that doesn’t in future treat the input and the output as such separate considerations.
What’s next
Many of the updates in this latest release have been in the works for some time. Changes to our metadata both large and small are considered carefully, but I’d like to do this in a transparent and cooperative way with our community.
I recently set up the “Metadata Practitioners Interest Group” and we’ve just had our second call. A big topic was how to best manage the ideas and requests from the community. The ability for public comments on GitLab is a first step.
This most recent update contains a mix of long term projects and updates to keep our metadata current and useful. Other changes that are under discussion will require more development on our end. But stay tuned for more information about forthcoming changes, as well information about how you can contribute.