Blog

2025 public data file now available

Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.

Drawing on the Research Nexus with Policy documents: Overton’s use of Crossref API

Update 2024-07-01: This post is based on an interview with Euan Adie, founder and director of Overton._

What is Overton?

Overton is a big database of government policy documents, also including sources like intergovernmental organizations, think tanks, and big NGOs and in general anyone who’s trying to influence a government policy maker. What we’re interested in is basically, taking all the good parts of the scholarly record and applying some of that to the policy world. By this we mean finding all the documents, finding what’s out there, collecting metadata for them consistently, fitting to our schema, extracting references from all the policy documents we find, adding links between them, and then we also do citation analysis.

2024 public data file now available, featuring new experimental formats

This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.

Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.

Download the file: The torrent download can be initiated here. Instructions for downloading via the “requester pays” method, along with other tips for using these files, can be found on the “Tips for working with Crossref public data files and Plus snapshots” page.

Subject codes, incomplete and unreliable, have got to go

Patrick Polischuk

Patrick Polischuk – 2024 March 13

In MetadataAPIs

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated. dfdfd

The subject metadata element was born out of a Labs experiment intended to enrich the metadata returned via Crossref Metadata Search with All Subject Journal Classification codes from Scopus. This feature was developed when the REST API was still fairly new, and we now recognize that the initial implementation worked its way into the service prematurely.

Increasing Crossref Data Reusability With Format Experiments

Martin Eve

Martin Eve – 2024 January 19

In MetadataCommunityAPIs

Every year, Crossref releases a full public data file of all of our metadata. This is partly a commitment to POSI and partly just what we do. We want the community to re-use our metadata and to find interesting ends to which they can be put!

However, we have also recognized, for some time, that 170GB of compressed .tar.gz files, spread over 27,000 items, is not the easiest of formats with which to work. For instance, there’s no indexing capacity on these files, meaning that it is virtually impossible simply to pull out the record for a DOI. Decompressing the .tar.gz files takes a good three hours or more even on high-end hardware, without any additional processing.

2023 public data file now available with new and improved retrieval options

We have some exciting news for fans of big batches of metadata: this year’s public data file is now available. Like in years past, we’ve wrapped up all of our metadata records into a single download for those who want to get started using all Crossref metadata records.

We’ve once again made this year’s public data file available via Academic Torrents, and in response to some feedback we’ve received from public data file users, we’ve taken a few additional steps to make accessing this 185 gb file a little easier.

2022 public data file of more than 134 million metadata records now available

In 2020 we released our first public data file, something we’ve turned into an annual affair supporting our commitment to the Principles of Open Scholarly Infrastructure (POSI). We’ve just posted the 2022 file, which can now be downloaded via torrent like in years past.

We aim to publish these in the first quarter of each year, though as you may notice, we’re a little behind our intended schedule. The reason for this delay was that we wanted to make critical new metadata fields available, including resource URLs and titles with markup.

With a little help from your Crossref friends: Better metadata

Jennifer Kemp

Jennifer Kemp – 2022 March 31

In MetadataLinkingAPIS

We talk so much about more and better metadata that a reasonable question might be: what is Crossref doing to help?

Members and their service partners do the heavy lifting to provide Crossref with metadata and we don’t change what is supplied to us. One reason we don’t is because members can and often do change their records (important note: updated records do not incur fees!). However, we do a fair amount of behind the scenes work to check and report on the metadata as well as to add context and relationships. As a result, some of what you see in the metadata (and some of what you don’t) is facilitated, added or updated by Crossref.

A ROR-some update to our API

Earlier this year, Ginny posted an exciting update on Crossref’s progress with adopting ROR, the Research Organization Registry for affiliations, announcing that we’d started the collection of ROR identifiers in our metadata input schema. 🩁

The capacity to accept ROR IDs to help reliably identify institutions is really important but the real value comes from their open availability alongside the other metadata registered with us, such as for publications like journal articles, book chapters, preprints, and for other objects such as grants. So today’s news is that ROR IDs are now connected in Crossref metadata and openly available via our APIs. 🎉

New public data file: 120+ million metadata records

Jennifer Kemp

Jennifer Kemp – 2021 January 19

In MetadataCommunityAPIs

2020 wasn’t all bad. In April of last year, we released our first public data file. Though Crossref metadata is always openly available––and our board recently cemented this by voting to adopt the Principles of Open Scholarly Infrastructure (POSI)</agic––we’ve decided to release an updated file. This will provide a more efficient way to get such a large volume of records. The file (JSON records, 102.6GB) is now available, with thanks once again to Academic Torrents.