This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Event Data uncovers links between Crossref-registered DOIs and diverse places where they are mentioned across the internet. Whereas a citation links one research article to another, events are a way to create links to locations such as news articles, data sets, Wikipedia entries, and social media mentions. We’ve collected events for several years and make them openly available via an API for anyone to access, as well as creating open logs of how we found each event. Some organisations are already using Event Data and we are keen for more to come on board.
Last year we gave an update on Event Data with apologies for being so quiet and a promise of more information at a later date. It’s been some time, so here goes…
I joined Crossref in the middle of last year as a Product Manager and was tasked with looking into Event Data. The first thing I found was a large amount of enthusiasm for Event Data, both within Crossref and further afield. The idea of gathering information beyond the metadata deposited by our members is popular, and creates valuable connections between DOIs and a range of other sources. Interest spans the spectrum of academic research, publishing, bibliometrics, and beyond.
At the same time, I found a project with a very solid, well-built code base but unstable performance. After being put into production in 2018, we didn’t provide sufficient support. Coupled with staff changes and other competing priorities, Event Data hasn’t had the opportunity to live up to early expectations.
To address these issues, we have embarked on a plan to make the server infrastructure more robust, improve monitoring, and make sure that the future of Event Data makes the best use of the resources we have without over-stretching. It means working with the community to determine the most essential aspects of Event Data, and providing support where it’s needed.
The steps below are not necessarily sequential and some depend on the completion of work in other parts of Crossref, but they outline the priorities we have for Event Data in 2021.
The Plan
Stability
Since we put in place our original Event Data infrastructure, the amount of incoming data has grown, and at an ever-increasing rate. In 2017 we were creating 2 million new events per month, that number is now over 20 million. We have known for some time that we need to refresh the infrastructure, but didn’t have the resources to move forward: now we do.
In the first part of the plan we will renew the server infrastructure that underpins Event Data. Maybe not a headline-grabbing move, but the aim is to reduce downtime and pull in missing data. Through improving our monitoring and shortening the response time when things go wrong, we will be able to ensure that events are added on a regular basis and the API can reliably handle requests.
We’ve made the first steps in this direction by upgrading our API infrastructure and making some other tweaks to improve performance. There is still work to do, but we’ve already seen a significant improvement in performance with nearly >99.99% uptime in December.
Consolidation
The second component of the plan is to review performance and data quality. We will evaluate the event sources, update artefacts (such as the lists of publisher landing pages and news websites, and review performance reporting. This will help us to have a better understanding of Event Data in its current form: if the stability component is about improving what comes in and goes and out, this part will give us increased confidence in what Event Data already contains.
Future roadmap
While the two steps above are being carried out, we will revisit the applications of Event Data and talk to organizations that currently use it or have expressed an interest. These conversations will feed into future development in which we will evaluate new sources and other ways to optimize the service.
Central to the roadmap will be continued support of the data citation endpoint in Scholix format, which we run in close collaboration with DataCite. Additionally, we will add new data from relationships between Crossref works, for example a preprint is matched to a journal article, or where there are corrections, retractions, or translations of works.
We expect to continue supporting the current sources of events and where there are organizations with either a strong interest in a particular source or a database of events that they can send directly, we are keen to build collaborations. Event Data, like everything that Crossref does, is a community-based effort.
Staying in touch
To join the conversation about Event Data and keep informed, head over to our Community pages. You can also check out our Gitlab pages. At the end of last year we updated the Education pages where you can learn more about Event Data.