New public data file: 120+ million metadata records

2 minute read.

New public data file: 120+ million metadata records

Jennifer Kemp – 2021 January 19

2020 wasn’t all bad. In April of last year, we released our first public data file. Though Crossref metadata is always openly available––and our board recently cemented this by voting to adopt the Principles of Open Scholarly Infrastructure (POSI)</agic––we’ve decided to release an updated file. This will provide a more efficient way to get such a large volume of records. The file (JSON records, 102.6GB) is now available, with thanks once again to Academic Torrents.

Use of our open APIs continues to grow, as does the metadata. Last year’s file was 112 million records and 65GB. Just nine months later (though it feels longer than that!), the new file is over 120 million records and over 102GB. That’s all of the Crossref records ever registered up to and including January, 7, 2021. We continue to see around 10% growth in records each year––and while journal articles account for most of the volume, preprints and book chapters are two of our fast-growing record types. In addition to the growth in the number of records, many of the records are getting bigger and better as members look at their participation report and understand the value of enriching metadata records for distribution throughout the scholarly ecosystem. Elsevier recently opened its references, enriching over 12 million records. A number of members, including Royal Society, Sage, Emerald, OUP, World Scientific and more have started adding <a href="/blog/open-abstracts-where-are-we/" target="_blank"gicabstracts which now number over 9 million.

Help us help you––using the torrent and other important notes

We decided to release these public data files largely to help support COVID-19 research efforts but of course use cases for Crossref metadata vary widely and a few pointers should help all users:

Use the torrent if you want all of these records. Everyone is welcome to the metadata but it will be much faster for you and much easier on our APIs to get so many records in one file.
Use the REST API to incrementally add new and updated records once you’ve got the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).
‘Limited’ and ‘closed’ <a href="/education/content-registration/descriptive-metadata/references/#00564/" target="_blank"gicreferences are not included in the file or our open APIs. And, while bibliographic metadata is generally required, lots of metadata is optional, so records will vary in quality and completeness.

Questions, comments and feedback are welcome at support@crossref.org.

Here’s hoping 2021 is a better year for us all! Stay well.

Get involved

Find a service

Documentation

About us

2026 April 01

Reflections from Bangkok

2026 March 31

Voices from Crossref Metadata Sprint in São Paulo

2026 March 30

DOI resolution and deposit outage on 17 March 2026

2026 March 24

Strengthening support for data citations and saying goodbye to Event Data

Blog

New public data file: 120+ million metadata records

Help us help you––using the torrent and other important notes

Further reading

Recent Posts

Categories

Archives