This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
We operate an OAI-PMH service for the distribution of metadata in XML. This system is based on the OAI-PMH version 2 repository framework and implements the interface as documented here.
The service interface can be used in different ways by public metadata users, Metadata Plus subscribers, and Crossref members.
Public metadata users: we allow public access to two OAI verbs, ListSets and ListIdentifiers, which allow for discovery of available information.
Metadata Plus subscribers: access to OAI verbs GetRecord and ListRecords require a subscription to our Metadata Plus service. Users of this service are provided with tokens to identify them. Tokens are placed in HTTPS Authorization headers as:
Crossref-Plus-API-Token: Bearer FullTokenHere
Crossref members may also use OAI-PMH to retrieve their deposited metadata using our deposit harvester using their member account.
Set hierarchy
We support selective harvesting according to sets defined by the hierarchy of publisher and title. Setspecs are formatted as follows:
record type:prefix:pubID (learn more about publication IDs) (for example: J:10.1002:4 = Journal content by the publisher Wiley, journal title Applied Organometallic Chemistry)
record type:prefix (for example: J:10.1002, journals owned by publisher Wiley)
The from and until dates in a request capture when a record was deposited or updated, not the published date of the item. This means a request for records from yesterday through today will return all records added or changed between then and now, regardless of the publication dates included in the records.
Set record types are:
J for journals
B for books, conference proceedings, dissertations, reports, and datasets
S for series
The default set for both ListIdentifiers and ListRecords is J (journals). A set (B for books or conference proceedings, S for series) must be specified to retrieve non-journal data.
With the ListSets request the set parameter is optional. Leaving off the set parameter will return only journal data which includes a list of publishers, their journal titles, and each year of publication for which we have metadata records.
With the ListIdentifiers request the set, from, and until parameters are optional. The from and until parameters are used to specify dates when the DOIs were registered with us and not the publication date.
Examples of requests
Request a list of DOIs registered since 2010-08-11:
We allow 3 concurrent initial OAI-PMH requests per user. There is no concurrency limit for follow-on requests (requests made with a resumption token). Due to the size of the repository, it is highly discouraged to perform a ListRecords action for the entire collection.
The best possible performance is had by requesting the changes made to one publication on a given date, such as:
If you are harvesting a large amount of data and run up against our 3 concurrent initial request limitation, it is recommended that you request data by prefix for a short time-frame (days to a week). For example, this request will give you all journal records owned by prefix 10.1234 registered or updated between 2017-07-06 and 2017-07-09 :
Many OAI requests are too big to be retrieved in a single transaction. If a given response contains a resumption token, you must make an additional request to retrieve the rest of the data. Resumption tokens remain viable for 48 hours.
The resumption token includes an expiry date of 48 hours:
Metadata Plus snapshots provide access to our 160,104,382-plus metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. Snapshots are available for Metadata Plus service users.
The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.
OAI-PMH example files
An example application for harvesting Crossref OAI data