Blog

What if I told you that bibliographic references can be structured?

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

A simpler text query form

The Simple Text Query form (STQ) allows users to retrieve existing DOIs for journal articles, books, and chapters by cutting and pasting a reference or reference list into a simple query box. For years the service has been heavily used by students, editors, researchers, and publishers eager to match and link references.

We had changes to the service planned for the first half of this year - an upgraded reference matching algorithm, a more modern interface, etc. In the spirit of openness and transparency, part of our project plan was to communicate these pending changes to STQ users well in advance of our 30 April completion date. What would users think? Could they help us improve upon our plans?

Reference matching: for real this time

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Matchmaker, matchmaker, make me a match

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?

What does the sample say?

At Crossref Labs, we often come across interesting research questions and try to answer them by analyzing our data. Depending on the nature of the experiment, processing over 100M records might be time-consuming or even impossible. In those dark moments we turn to sampling and statistical tools. But what can we infer from only a sample of the data?

Linking references is different from registering references

From time to time we get questions from members asking what the difference is between reference linking and registering references as part the Content Registration process. Here’s the distinction: Linking out to other articles from your reference lists is a key part of being a Crossref members - it’s an obligation in the membership agreement and it levels the playing field when all members link their references to one another.

Revised Crossref DOI display guidelines are now active

Crossref DOI Display

We have updated our DOI display guidelines as of March 2017, this month! I described the what and the why in my previous blog post New Crossref DOI display guidelines are on the way and in an email I wrote to all our members in September 2016. I’m pleased to say that the updated Crossref DOI display guidelines are available via this fantastic new website and are now active. Here is the URL of the full set of guidelines in case you want to bookmark it (https://doi.org/10.13003/5jchdy) and a shareable image to spread the word on social media.

Included, registered, available: let the preprint linking commence.

We began accepting preprints as a new record type last month (in a category known as “posted content” in our XML schema). Over 1,000 records have already been registered in the first few weeks since we launched the service.

By extending our existing services to preprints, we want to help make sure that:

  • links to these publications persist over time
  • they are connected to the full history of the shared research
  • the citation record is clear and up-to-date.

New Crossref DOI display guidelines are on the way

TL;DR

Crossref will be updating its DOI Display Guidelines within the next couple of weeks.  This is a big deal.  We last made a change in 2011 so it’s not something that happens often or that we take lightly.  In short, the changes are to drop “dx” from DOI links and to use “https:” rather than “http:”.  An example of the new best practice in displaying a Crossref DOI link is: https://doi.org/10.1629/22161

Linked Clinical Trials initiative gathers momentum

We now have linked clinical trials deposits coming in from five publishers: BioMedCentral, BMJ, Elsevier, National Institute for Health Research and PLOS. It’s still a relatively small pool of metadata - around 4000 DOIs with associated clinical trial numbers - but we’re delighted to see that “threads” of publications are already starting to form. If you look at this article in The Lancet and click on the Crossmark button you will see that in the Clinical Trials section there are links to three other articles reporting on the same trial: two from the American Heart Journal and one from BMJ’s Heart.