Blog

The anatomy of metadata matching

Crossref logo icon https://doi.org/10.13003/zie7reeg

In our previous blog post about metadata matching, we discussed what it is and why we need it (tl;dr: to discover more relationships within the scholarly record). Here, we will describe some basic matching-related terminology and the components of a matching process. We will also pose some typical product questions to consider when developing or integrating matching solutions.

Basic terminology

Metadata matching is a high-level concept, with many different problems falling into this category. Indeed, no matter how much we like to focus on the similarities between different forms of matching, matching affiliation strings to ROR IDs or matching preprints to journal papers are still different in several important ways. At Crossref and ROR, we call these problems matching tasks.

Metadata matching 101: what is it and why do we need it?

Crossref logo icon https://doi.org/10.13003/aewi1cai

At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective. In this blog, the first one in a series about metadata matching, we will cover the very basics of matching: what it is, how we do it, and why we devote so much effort to this problem.

2024 public data file now available, featuring new experimental formats

This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.

Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.

Download the file: The torrent download can be initiated here. Instructions for downloading via the “requester pays” method, along with other tips for using these files, can be found on the “Tips for working with Crossref public data files and Plus snapshots” page.

Common views and questions about metadata across Africa

This past year has been a captivating journey of immersion within the Crossref community, a mix of online interactions and meaningful in-person experiences. From the engaging Sustainability Research and Innovation Conference in Port Elizabeth, South Africa, to the impactful webinars conducted globally, this has been more than just a professional endeavour; it has been a personal exploration of collaboration, insights, and a shared commitment to pushing the boundaries of scholarly communication.

Subject codes, incomplete and unreliable, have got to go

Patrick Polischuk

Patrick Polischuk – 2024 March 13

In MetadataAPIs

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated. dfdfd

The subject metadata element was born out of a Labs experiment intended to enrich the metadata returned via Crossref Metadata Search with All Subject Journal Classification codes from Scopus. This feature was developed when the REST API was still fairly new, and we now recognize that the initial implementation worked its way into the service prematurely.

RORing ahead: using ROR in place of the Open Funder Registry

A few months ago we announced our plan to deprecate our support for the Open Funder Registry in favour of using the ROR Registry to support both affiliation and funder use cases. The feedback we’ve had from the community has been positive and supports our members, service providers and metadata users who are already starting to move in this direction.

We wanted to provide an update on work that’s underway to make this transition happen, and how you can get involved in working together with us on this.

Increasing Crossref Data Reusability With Format Experiments

Martin Eve

Martin Eve – 2024 January 19

In MetadataCommunityAPIs

Every year, Crossref releases a full public data file of all of our metadata. This is partly a commitment to POSI and partly just what we do. We want the community to re-use our metadata and to find interesting ends to which they can be put!

However, we have also recognized, for some time, that 170GB of compressed .tar.gz files, spread over 27,000 items, is not the easiest of formats with which to work. For instance, there’s no indexing capacity on these files, meaning that it is virtually impossible simply to pull out the record for a DOI. Decompressing the .tar.gz files takes a good three hours or more even on high-end hardware, without any additional processing.

Open Funder Registry to transition into Research Organization Registry (ROR)

Today, we are announcing a long-term plan to deprecate the Open Funder Registry. For some time, we have understood that there is significant overlap between the Funder Registry and the Research Organization Registry (ROR), and funders and publishers have been asking us whether they should use Funder IDs or ROR IDs to identify funders. It has therefore become clear that merging the two registries will make workflows more efficient and less confusing for all concerned. Crossref and ROR are therefore working together to ensure that Crossref members and funders can use ROR to simplify persistent identifier integrations, to register better metadata, and to help connect research outputs to research funders.

Metadata connects the global community – summary of our Community update 2023

Kornelia Korzec

Kornelia Korzec – 2023 May 12

In MetadataCommunity

We were delighted to engage with over 200 community members in our latest Community update calls. We aimed to present a diverse selection of highlights on our progress and discuss your questions about participating in the Research Nexus. For those who didn’t get a chance to join us, I’ll briefly summarise the content of the sessions here and I invite you to join the conversations on the Community Forum.

You can take a look at the slides here and the recordings of the calls are available here.

2023 public data file now available with new and improved retrieval options

We have some exciting news for fans of big batches of metadata: this year’s public data file is now available. Like in years past, we’ve wrapped up all of our metadata records into a single download for those who want to get started using all Crossref metadata records.

We’ve once again made this year’s public data file available via Academic Torrents, and in response to some feedback we’ve received from public data file users, we’ve taken a few additional steps to make accessing this 185 gb file a little easier.