This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Our API allows researchers to easily harvest full-text documents from all participating members, regardless of whether the content is open access or subscription. The member is responsible for delivering the full-text content requested, so open access content can simply be delivered, while subscription content is available through access control systems.
To mine our metadata, you should have a list of DOIs for the content you want to download, and a safelist of licenses that you accept. You can get a list of DOIs from citations, our metadata search, our metadata API, or another source.
Check to see if the DOI has license and full-text details in its metadata
Check the license against your safelist of acceptable licenses
If you agree to the license, follow the link and download the full-text of the content item.
The absence of a license does not mean that the full-text can be used without one. Members should deposit both the license and the full-text link at the same time.
You should be able to integrate with the API very easily with your TDM software.
Step 1
Fetch the metadata: at its simplest, you can issue a HTTP GET request using a Crossref DOI and use DOI content negotiation. For example, the following cURL command will retrieve the metadata for the DOI 10.5555/515151:
This will return the metadata for the specified DOI, as well as a link header which points to several representations of the full-text on the member’s site:
library(httr) r = content(GET('http://dx.doi.org/10.5555/515151', add_headers(Accept = 'application/vnd.crossref.unixsd+xml'))) r
If present, the full-text URL will also be returned in the metadata for the DOI. For instance, in our unixref schema, you would also see this in the returned metadata:
Deciding what to do. Members who enable mining through us need to register a stable license URL using the <license_ref> element. For example, this unixref extract shows that the DOI is licensed under the Creative Commons CC-BY license:
The license that the URL points to does not have to be machine-readable. Check the license against your safelist. If you agree to it, you can proceed. If you don’t agree to it, put it in a list of licenses to review later and add to your safelist (or blacklist).
If a content item is under embargo, a slight complication arises: the member can use a start_date attribute on the <license_ref> element. In this example, the content item is under a proprietary license for a year after its publication date, after which it is licensed under a CC-BY license:
TDM tools can easily use a combination of the <license_ref> element(s) and the start_date attribute to determine if the content item is currently under embargo.
If you are not interested in receiving the metadata for the DOI, you can simply issue an HTTPS HEAD request and you will get the link header without the rest of the DOI record.
Step 3
Fetching the full-text: you can now perform a standard GET request on the URL to download the full-text from the member’s site. Because the bulk downloading of large amounts of data may put a strain on the member’s servers, we have defined a set of rate-limiting HTTPS headers. You are not obliged to test for and act on these headers, and not all members will use them, but doing so will avoid surprises.
You are trying to access content from a publisher that requires you to accept a TDM license; consider modifying your tools to work with such publishers’ licenses.