This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
The day I received my learner driver permit, I remember being handed three things: a plastic thermosealed reminder that age sixteen was not a good look on me; a yellow L-plate sign as flimsy as my driving ability; and a weighty ‘how to drive’ guide listing all the things that I absolutely must not, under any circumstances, even-if-it-seems-like-a-really-swell-idea-at-the-time, never, ever do.
The margin space dedicated to finger-wagging left little room for championing any driving-do’s. And as each page delivered a fresh new warning, my enthusiasm for hitting the road sunk to levels usually reserved for activities like trigonometry and visits to my orthodontist.
Many years (and an excellent driving record) later, I’m reminded of this again now when thinking about our own Event Data User Guide. Because it contains a chapter with some really important don’ts for our members. Really good, we’d-love-you-to-consider-not-doing-these-things type of advice. But despite our intent to encourage, I feel the ghost of finger-waggers past. So in the spirit of championing enthusiasm over ennui, I thought I’d attempt to contextualise our Event Data Best Practices Guide for Publishers and show you why there’s a lot of good reasons for publishers to be enthusiastic about these rules.
So if you’re a publisher, I encourage you to read on to learn more about how you can help us have the best chance possible of capturing Events for your content.
What’s in it for you? Well, collecting this data helps to give everyone (Crossref, yourself, and others) a better picture of how your content is being used, including for altmetrics.
1. Please let us in
Please do open the door when we come knocking, we promise not to stay long. You can do this by allowing the User Agent CrossrefEventDataBot to visit your site, and whitelisting it if necessary. The bot is how we visit URLs to confirm if they are for an item of content registered with us. The reason why we’re visiting your site could include:
someone tweeted an article landing page
someone discussed it on Reddit
it was linked to from a blog post
The Bot has only one job: to work out the DOI. No information beyond this is stored. Whenever we become aware of a link that we think points to a DOI or an Article Landing Page, we follow it so we can collect the required metadata. Everything in Crossref Event Data is linked via its DOI, so it’s important that we can collect this information.
The bot will identify itself using the standard method. It sets two headers:
Once we confirm that a link points to registered content, we then log an Event for the DOI. You should expect our bot to visit no more than once or twice per second, although if there is a period of activity around your articles, you may see higher rates. The bot also takes a sample of DOIs and visits them to work out which domain names belong to our members, so it can maintain a list. This can happen every few weeks. You may see a small number of requests from the bot, but limited to one per second.
If we can’t enter your site to look for metadata though, then we won’t be able to collect Events for your DOIs. So by allowing our bot, you will be helping us to collect Event Data for your registered content.
If you’re worried about traffic on your site, consider sending us your mapping of article landing pages to DOIs. Because Resource URLs aren’t the same as article landing pages, we need more information than the DOI Resource URLs that you already send us.
If you’re running a blog or website (and you’re not a member of Crossref), you may also see our bot visiting, to look for links that comprise Events. Please allow us to visit, so we can record in our Event Data service the fact that your website links to registered content.
2. We ❤️ robots.txt
Robots.txt files are important and we ensure our Event Data Bot respects yours. If we are instructed not to visit a site, we won’t. So if you want us to visit your site in order to check the metadata of your article landing page, please ensure you provide an exception for our Bot, or make sure that you’re not blocking it. Check the restrictions in your file to see if we’re allowed to visit. This is just another way you can help us work for you.
3. Include the DC Identifier
Including good metadata is general best practice for scholarly publishing. When we visit a publisher’s site, we look for metadata embedded in the HTML document (such as DC.Identifier tags that, amongst other things, enable Crossmark to work).
By ensuring you include a Dublin Core identifier meta tag in each of your articles pages, our system can match your landing pages back to DOIs.
Here’s an example:
4. Let us in, even if we don’t bring cookies
We’re like that friend who turns up for dinner without bringing a bottle of wine. And we hope that you’ll be ok with that. Some Publisher sites don’t allow browsers to visit unless cookies are enabled and they block visitors that don’t accept them. If your site does this, we will be unable to collect Events for your DOIs. Allowing your site to be accessed without cookies will help give us the best chance of successfully reading your metadata.
5. We may not speak your language
Sometimes we come across a publisher’s site that won’t render unless JavaScript is enabled. This means that the site won’t show any content to browsers that don’t execute JavaScript. The Event Data Bot does not execute JavaScript when looking for a DOI. This means that if your site requires JavaScript, then we will be unable to collect DOIs for your Events. Consider allowing your site to be accessed without JavaScript. And if this is not possible, then if you ensure you include the tag in the HTML header, then we’ll do our best to collect Events for your registered content.
If you want to pass this on to your friendly system administrator, the best practice is documented in full here: https://www.eventdata.crossref.org/guide/best-practice/publishers-best-practice/. And sorry about all the don’ts you’ll find on that page…. don’t let them curb your enthusiasm for taking Event Data out for a spin!