This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
When you register your content with us, you create a metadata record for a digital object. The metadata within that record becomes an enduring, widely distributed connection to the research nexus.
Our requirements are minimal, beyond basic bibliographic metadata. We’d like to require everything, but don’t because:
Not all metadata fields are relevant. For example, not all journals have volumes and issues, and not all articles have funding.
Our members are not always able to send us everything, and having some metadata is better than having no metadata. For example, it’s better to have an identifier attached to basic bibliographic information than for there to be no identifier at all.
Some metadata are hard to come by. For example, digitized back issues may not have good reference lists available.
However, we hope all members will follow our metadata best practices rather than just meeting the basic requirements. This will ensure that the records and identifiers you register with us are discoverable and connected.
Principles (modeled on Metadata 20/20 principles)
Metadata 20/20 has a set of basic principles that can be applied to our metadata to ensure that it is Compatible, Complete, Credible and Curated.
Principles are aspirational - they help us define what we hope to accomplish with our metadata. So while we don’t meet all of the principles completely, they can still guide us as we move forward. Let’s take a look at the Metadata 20/20 principles one-by-one.
COMPATIBLE: provide a guide to content for machines and people
So, metadata must be as open, interoperable, parsable, machine actionable, human readable as possible.
How are we compatible?
The metadata provided to Crossref is made freely and openly available through our APIs
Crossref metadata is provided in both JSON and XML formats. Our JSON and ‘UNIXSD’ XML formats are comprehensive and contain all metadata registered with us.
We also provide limited metadata tailored for specific purposes via content negotiation (BibTeX, RIS, RDF).
We try to make use of vocabularies and identifiers as much as possible, and allow free text only when there is no other option.
What more can we do?
Provide a JSON schema to make REST API outputs easier to ingest.
Adopt and support existing and new standards that define the metadata we collect.
COMPLETE: reflect the content, components and relationships as published
So, metadata must be as complete and comprehensive as possible.
How are we complete?
We aim to collect all metadata that is relevant to describing and using the scholarly content registered with us, and work to make it possible for members to send this metadata to us.
What more can we do?
A lot, this is our biggest challenge - we need to:
Make it easy for members to send metadata to us.
Make it easy for members to assess the metadata they have sent to us.
Evolve our schema (or evolve beyond an XML schema) to quickly to support new types of content and metadata segments.
CREDIBLE: enable content discoverability and longevity
So, metadata must be of clear provenance, trustworthy and accurate.
How are we credible?
Our metadata is provided to us by our members, and we don’t curate or clean up the metadata in any way. We do insert metadata into outputs such as DOI matches for citations, recursive relationships, and clearly flag those pieces as being inserted by Crossref in our metadata outputs.
This means, good or bad, metadata accuracy depends on the quality of metadata provided by our members.
What more can we do?
We can:
Facilitate reporting and correction of metadata errors identified by metadata users.
Create tools to help members assess their metadata quality.
CURATED: reflect updates and new elements
So, metadata must be maintained over time.
How are we curated?
An important obligation for our members is to keep metadata up to date - for some this may mean periodically updating registered URLs, for others this may mean ensuring license and Crossmark data is current. (Find out more about maintainng metadata.)
What more can we do?
Assess and report URLs that are broken.
Provide tools to allow members to assess their license metadata.
Make sure that DOIs that move from member to member are maintained.