In the first half of this year we’ve been talking to our community about post-publication changes and Crossmark. When a piece of research is published it isn’t the end of the journey—it is read, reused, and sometimes modified. That’s why we run Crossmark, as a way to provide notifications of important changes to research made after publication. Readers can see if the resesarch they are looking at has updates by clicking the Crossmark logo.
We’re happy to note that this month, we are marking five years since Crossref launched its Grant Linking System. The Grant Linking System (GLS) started life as a joint community effort to create ‘grant identifiers’ and support the needs of funders in the scholarly communications infrastructure.
The system includes a funder-designed metadata schema and a unique link for each award which enables connections with millions of research outputs, better reporting on the research and outcomes of funding, and a contribution to open science infrastructure.
In our previous blog post about metadata matching, we discussed what it is and why we need it (tl;dr: to discover more relationships within the scholarly record). Here, we will describe some basic matching-related terminology and the components of a matching process. We will also pose some typical product questions to consider when developing or integrating matching solutions.
Basic terminology Metadata matching is a high-level concept, with many different problems falling into this category.
Update 2024-07-01: This post is based on an interview with Euan Adie, founder and director of Overton._
What is Overton? Overton is a big database of government policy documents, also including sources like intergovernmental organizations, think tanks, and big NGOs and in general anyone who’s trying to influence a government policy maker. What we’re interested in is basically, taking all the good parts of the scholarly record and applying some of that to the policy world.
We believe in Persistent Identifiers. We believe in defence in depth. Today we’re excited to announce an upgrade to our data resilience strategy.
Defence in depth means layers of security and resilience, and that means layers of backups. For some years now, our last line of defence has been a reliable, tried-and-tested technology. One that’s been around for a while. Yes, I’m talking about the humble 5¼ inch floppy disk.
This may come as surprise to some. When things go well, you’re probably never aware of them. In day to day use, the only time a typical Crossref user sees a floppy disk is when they click ‘save’ (yes, some journals still require submissions in Microsoft Word).
History
But why?
Let me take you back to the early days of Crossref. The technology scene was different. This data was too important to trust to new and unproven technologies like Zip disks, CD-Rs or USB Thumb Drives. So we started with punched cards.
IBM 5081-style punched card.
Punched cards are reliable and durable as long as you don’t fold, spindle or mutilate them. But even in 2001 we knew that punched cards’ days were numbered. The capacity of 80 characters kept DOIs short. Translating DOIs into EBCDIC made ASCII a challenge, let alone SICIs. We kept a close eye on the nascent Unicode.
Breathing Room
In 2017 the change of DOI display guidelines from http://dx.doi.org to https://doi.org shortened each DOI by 2 characters, buying us some time. But eventually we knew we had to upgrade to something more modern.
So we migrated to 5¼ inch floppy disks.
5¼ Floppy disk in drive
At 640 KB per disk these were a huge improvement. We could fit around 20,000 DOIs on one floppy. Today we only need around 10,000 floppy disks to store all of our DOIs (not the metadata, just the DOIs). Surprisingly this only takes about 20 metres of shelf space to store.
Typical work from home setup. Getting ready to backup some DOIs!
The move to working-from-home brought an unexpected benefit. Staff mail floppy disks to each other and keep them in constant rotation, which produces a distributed fault tolerant system.
Persistence Means Change
But it can’t last forever. DOIs registration shows no sign of slowing down. It’s clear we need a new, compact storage medium. So, after months of research, we’ve invested in new equipment.
Today we announce our migration to 3½ inch floppies.
If it goes to plan you won’t even notice the change.