Post Mortem - Crossref

What is in this update?

Publication typing for citations

This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.

Outage of March 24, 2022

Geoffrey Bilder – 2022 March 24

In Data CenterPost Mortem

So here I am, apologizing again. Have I mentioned that I hate computers?

We had a large data center outage. It lasted 17 hours. It meant that pretty much all Crossref services were unavailable - our main website, our content registration system, our reports, our APIs. 17 hours was a long time for us - but it was also an inconvenient time for numerous members, service providers, integrators, and users. We apologise for this.

Update on the outage of October 6, 2021

Geoffrey Bilder – 2021 October 27

In Data CenterPost Mortem

In my blog post on October 6th, I promised an update on what caused the outage and what we are doing to avoid it happening again. This is that update.

Crossref hosts its services in a hybrid environment. Our original services are all hosted in a data center in Massachusetts, but we host new services with a cloud provider. We also have a few R&D systems hosted with Hetzner.

We know an organization our size has no business running its own data center, and we have been slowly moving services out of the data center and into the cloud.

Outage of October 6, 2021

Geoffrey Bilder – 2021 October 06

In Data CenterPost Mortem

On October 6 at ~14:00 UTC, our data centre outside of Boston, MA went down. This affected most of our network services- even ones not hosted in the data centre. The problem was that both of our primary and backup network connections went down at the same time. We’re not sure why yet. We are consulting with our network provider. It took us 2 hours to get our systems back online.

Lesson learned, the hard way: Let’s not do that again!

Isaac Farley – 2021 September 08

In Content RegistrationMetadataPost MortemURL Updates

TL;DR

We missed an error that led to resource resolution URLs of some 500,000+ records to be incorrectly updated. We have reverted the incorrect resolution URLs affected by this problem. And, we’re putting in place checks and changes in our processes to ensure this does not happen again.

How we got here

Our technical support team was contacted in late June by Wiley about updating resolution URLs for their content. It’s a common request of our technical support team, one meant to make the URL update process more efficient, but this was a particularly large request. Shortly thereafter, we were provided with nearly 1,200 separate files by Atypon on behalf of Wiley in order to update the resolution URLs of ~9 million records. We manually spot checked over 50 of these files, because, prior to this issue, our technical support team did not have a mechanism to automatically check for errors. That labor intensive review did not turn up any problems. That is, those 50 samples had no errors with the headers, like were found later.

RSS Feed

Get involved

Find a service

Documentation

About us

2025 March 19

Version 5.4.0 metadata schema update now available

What is in this update?

Publication typing for citations

2025 March 12

2025 public data file now available

2025 March 05

Come ROR with us: Using ROR IDs in place of Funder IDs

2025 February 27

The GEM program - Year Two 2024

Blog

Outage of March 24, 2022

Update on the outage of October 6, 2021

Outage of October 6, 2021

Lesson learned, the hard way: Let’s not do that again!

TL;DR

How we got here

Recent Posts

Categories

Archives