At the end of last year, we were excited to announce our renewed commitment to community and the launch of three cross-functional programs to guide and accelerate our work. We introduced this new approach to work towards better cross-team alignment, shared responsibility, improved communication and learning, and make more progress on the things members need.
This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
UPDATE, 24 August 2021: All pools have been migrated to the new Elasticsearch-backed API, which already appears to be more stable and performant than the outgoing Solr API. Please report any issues via our Crossref issue repository in Gitlab.
UPDATE, 9 August 2021: The cutovers for the polite and Plus pools are delayed again. We’re still working to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one more week–the polite pool is scheduled for 2021 August 17 and the Plus pool is scheduled for 2021 August 24.
UPDATE, 2 August 2021: The cutovers for the polite and Plus pools are delayed. We’ve been mirroring traffic to the new polite pool and want to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one week–the polite pool is scheduled for 2021 August 10 and the Plus pool is scheduled for 2021 August 17.
UPDATE, 13 July 2021: The first stage of the cutover is complete, so requests to the public pool are now being served by the new REST API. We took a slightly different approach to performing the cutover, so the “Documentation” and “Temporary domain” sections below have been updated.
Our REST API is the primary interface for anybody to fetch the metadata of content registered with us, and we’ve been working hard on a more robust REST API service that’s about to go live.
We also offer enhanced access to our APIs and other services with Metadata Plus, and we recommend it for production services and others that benefit from guaranteed up-time, a higher rate limit, and priority support from our helpful staff.
For a while now, we’ve been working to migrate the REST API from Solr to Elasticsearch and from our datacenter to a cloud platform in order to address issues of scalability and extensibility.
We’re pleased to announce that we’ll be cutting over to the Elasticsearch-backed version of the REST API over the next few weeks, beginning July 13. This cutover will occur one pool at a time–the public pool will be migrated first, followed by the polite pool on August 3, and the Plus pool on August 10 (see ’etiquette’ link above if you’re unfamiliar with our different pools). Please note updates at the top of this post for changes to the original schedule.
We’ve thoroughly tested the functionality and performance of the new REST API, and we’d like to invite you to test it out before we move production traffic to the new service. Try out your favorite API queries at https://api.production.crossref.org/.
Feature parity, but note a few differences
One of our primary objectives was to maintain feature parity between the old and new services, avoiding any breaking changes that might cause problems for existing services integrating with the REST API. We implemented a regression test suite which has given us the confidence to make such a foundational change. During the course of this project, we found it necessary and a good opportunity to make a few modifications. In each case, we analyzed usage and aimed to avoid making any breaking changes. We hope these represent improvements to the behavior and consistency of the REST API.
The group-title filter uses exact matching. This filter previously worked but was undocumented and unsupported.
The directory filter is deprecated. This was meant to be an experimental, unsupported filter, and the data has not met the standard we require.
The affiliation facet returns counts of affiliation strings rather than counts of terms within affiliation fields (thus resolving this Github issue).
Cursors may be used to page through results from the /members, /funders, and /journals routes, in addition to /works.
While we suggest that everyone use cursors for pagination, we still support the offset functionality. We have introduced a limit of 80000 for offset values for the /members /funders and /journals routes
offset behavior is slightly changed, now applying to the sum of rows and offsets rather than just offsets.
The published field is now present in API responses.
The /licenses route returns paged results.
Sorting by submitted is no longer supported. This was never officially supported or documented.
The /quality route has been removed. This was an undocumented, experimental feature.
Funder name in /works metadata is the name provided by the publisher.
Empty relation fields correctly return an empty object.
Only ISBN and isbn-type for a record will be returned. ISBNs for associated volumes will be omitted.
The institution field is a list.
query uses different stop word defaults, though we expect querying to remain roughly the same.
API responses may feature slightly different scores, as they come from different backends.
Some technical notes on the cutover
Documentation
The above changes are documented in our new REST API documentation, which is now automatically generated via Swagger, resulting in more comprehensive coverage and more efficient feature development. During the cutover, the right documentation for you will depend on which pool you are using. The documentation for the new API can be found by visiting the API in a browser, or by navigating to https://api.crossref.org/help; and the docs for the old API remain here: https://github.com/CrossRef/rest-api-doc. The Github-hosted documentation will be deprecated once the cutover is complete.
This may not come as news, but bears repeating as we mentioned GitHub. We have moved our source code repositories from GitHub to GitLab, including all of our issue tracking.
Temporary domain
UPDATE: We ended up performing the public pool cutover via reverse proxies instead of redirects–please disregard the note about temporary domains below. The api.crossref.org domain will remain the domain regardless of which pool you’re using or where we are in the cutover process.
Please note that the api.production.crossref.org domain is a temporary domain we are using during this cutover period. Traffic will be redirected to the new service one pool at a time via a 307 http redirect. Once the cutover is complete, we will go back to using the api.crossref.org domain. Do not update any software, scripts, libraries, tools, etc. to use the temporary domain.
Differences in query results
Due to inherent differences in how Solr and Elasticsearch perform queries and rank results, you may see slightly different results when comparing the old and new services. If for whatever reason your workflow involves using multiple API pools (which we don’t recommend), you may see inconsistent results.
Cursor behavior
Cursors may break if your script is paging through results at the exact moment the cutover is performed, and you should retry your request once the release is complete. We will post the precise maintenance window to https://status.crossref.org/.
Filing issues
Feature requests and bug reports should be filed into the Crossref issue repository in Gitlab during this testing phase and once the new Elasticsearch-backed API is live in production.
Coming next
While we hope the benefits of improved stability and extensibility are as exciting to you as they are to us, “feature parity” may not be the most thrilling message for our API users. In truth, one of the more exciting aspects of completing this migration is the end of the code freeze we instituted at the start of this effort. Now, we can work on new feature development and a continuous stream of bug fixes. We also improved the automatic test coverage as part of the work, meaning we can deliver features with greater confidence.
The first new feature we’ll be delivering via the REST API will be support for the “grants” record type, allowing for the retrieval of metadata for grants that have been registered with us, now numbering over 20,000 from 8 different funder members. This work is well underway and will be released once we are confident that the new REST API is stable in production. From there, we’ll continue to select the highest priority issues from our REST API backlog.