This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Metadata Plus snapshots provide access to our 160,104,382 metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.
How to access snapshots
New snapshots are created each month, available by the 5th day, providing all records up to and including the previous month.
If you’re looking for the most up-to-date snapshot (all records up to and including the previous month), you can use the following URLs which will always alias to the current month:
XML output: https://api.crossref.org/snapshots/monthly/{YYYY/MM}/all.xml.tar.gz
Please note that XML snapshots are available in UNIXSD format only.
As snapshots are available to Metadata Plus users only, you will need to identify yourself in the request by using a “Crossref-Plus-API-Token” HTTPS header with your access token. The example below shows how this should be formatted, with XXX replaced by your token:
Crossref-Plus-API-Token: Bearer XXX
The files will be very large (>42GB) so may take a while to download depending on the speed of your internet connection.
Please contact us if you’re unable to access snapshots.
Keeping your data current
For applications where you want to keep a copy of our metadata records current, use OAI-PMH Plus (as described above) or the REST API to query for new records at your preferred interval.
Snapshots FAQs
Are snapshots for ‘all time’ available?
Snapshots are available for current and previous quarters. With each new snapshot, we may remove files older than the current and previous quarters. For example, on 1 April the files from the previous October, November, and December may be removed.
I’m seeing a 404 error when I request the URL
If you’re looking for the current month, this may be because the archive hasn’t yet been created for that month. Snapshots are usually available by the 5th of each month.
If you’re looking for a month that’s more than 6 months old, it may be that the snapshot has been deleted. If the archive you looking isn’t particularly new or old and you’re still seeing a 404 error, please contact us.
I’m seeing a 401 error when I request the URL
Snapshots are only available to Metadata Plus users. This 401 message means that the system doesn’t recognise you as a Metadata Plus user. If you’re already a Metadata Plus user, make sure you’re using your correct token in the header of your query. If you’re still having problems, please contact us.
I need a full snapshot mid-month
Snapshot archives are provided at the start of each month. The archive contains all the registered content received by Crossref up until that time. (Really? Yeah, all of it.) If you need a snapshot mid-month, you should download and ingest the latest archive and then harvest and ingest the registered content that has changed since then.
To get the registered content that has changed since an archive was created, use OAI-PMH Plus or the REST API. For example, if the archive was created on January 31, 2018 then the OAI-PMH Plus harvest’s initial URL is
It is important to use the created date and not the completed date. It takes time to build the archive, so changes will occur during the build. Using the created date ensures those changes are harvested too.