This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Crossref has a newish API encompassing works, journals, members, funders and more (check out the API docs), as well as a few other services. Essential to making the Crossref APIs easily accessible—and facilitating easy tool/app creation and exploration—are programmatic clients for popular languages. I’ve maintained an R client for a while now, and have been working on Python and Ruby clients for the past four months or so.
The R client falls squarely into the analytics/research use cases, while the Python and Ruby clients are ideal for general data access and use in web applications (the Javascript library below as well).
I’ve strived to make each client in idiomatic fashion according to the language. Due to this fact, there is not generally correspondence between the different clients with respect to data outputs. However, I’ve tried to make method names similar across Ruby and Python; although the R client is quite a bit older, so method names differ from the other clients and I’m resistant to changing them so as not to break current users’ projects. In addition, R users are likely to want a data.frame (i.e., table) of results, so we give back that - whereas with Python and Ruby we give back dictionaries and hashes, respectively.
The serrano command line tool is quite powerful if you are used to doing things there.
Here, search for one article; summary data is shown.
serrano works 10.1371/journal.pone.0033693
#> DOI: 10.1371/journal.pone.0033693
#> type: journal-article
#> title: Methylphenidate Exposure Induces Dopamine Neuron Loss and Activation of Microglia in the Basal Ganglia of Mice
There’s also a -json flag to give back JSON data, which can be parsed with the command line tool jq.
rcrossref also has faster versions of most functions with an underscore at the end (_) which only do the http request and give back json (e.g., cr_works_())
Comparison of Crossref Client Methods
After installation and loading the libraries above, the below methods are available
Filters (see API docs for details) are a powerful way to get closer to exactly what you want in your queries. In the Crossref API filters are passed as query parameters, and are comma-separated like filter=has-orcid:true,is-update:true . In the client libraries, filters are passed in idiomatic fashion according to the language.
Note how syntax is quite similar among languages, though keys don’t have to be quoted in Ruby and R, and in R you pass in a vector or list instead of a hash as in the other two.
All 3 clients have helper functions to show you what filters are available and what the options are for each filter.
Sometimes you want a lot of data. The Crossref API has parameters for paging (see rows and offset), but large values of either can lead to long response times and potentially timeouts (i.e., request failure). The API has a deep paging feature that can be used when large data volumes are desired. This is made possible via Solr’s cursor feature (e.g., blog post on it). Here’s a run down of how to use it:
cursor: each method in each client library that allows deep paging has a cursor parameter that if you set to * will tell the Crossref API you want deep paging.
cursor_max: for boring reasons we need to have feedback from the user when they want to stop, since each request comes back with a cursor value that we can make the next request with, thus, an additional parameter cursor_max is used to indicate the number of results you want back.
limit: this parameter when not using deep paging determines number of results to get back. however, when deep paging, this parameter sets the chunk size. (note that the max. value for this parameter is 1000)
For example, cursor=”*” states that you want deep paging, cursor_max states maximum results you want back, and limit determines how many results per request to fetch.