Looking back over 2024, we wanted to reflect on where we are in meeting our goals, and report on the progress and plans that affect you - our community of 21,000 organisational members as well as the vast number of research initiatives and scientific bodies that rely on Crossref metadata.
In this post, we will give an update on our roadmap, including what is completed, underway, and up next, and a bit about what’s paused and why.
The Crossref2024 annual meeting gathered our community for a packed agenda of updates, demos, and lively discussions on advancing our shared goals. The day was filled with insights and energy, from practical demos of Crossref’s latest API features to community reflections on the Research Nexus initiative and the Board elections.
Our Board elections are always the focal point of the Annual Meeting. We want to start reflecting on the day by congratulating our newly elected board members: Katharina Rieck from Austrian Science Fund (FWF), Lisa Schiff from California Digital Library, Aaron Wood from American Psychological Association, and Amanda Ward from Taylor and Francis, who will officially join (and re-join) in January 2025.
Background The Principles of Open Scholarly Infrastructure (POSI) provides a set of guidelines for operating open infrastructure in service to the scholarly community. It sets out 16 points to ensure that the infrastructure on which the scholarly and research communities rely is openly governed, sustainable, and replicable. Each POSI adopter regularly reviews progress, conducts periodic audits, and self-reports how they’re working towards each of the principles.
In 2020, Crossref’s board voted to adopt the Principles of Open Scholarly Infrastructure, and we completed our first self-audit.
In June 2022, we wrote a blog post “Rethinking staff travel, meetings, and events” outlining our new approach to staff travel, meetings, and events with the goal of not going back to ‘normal’ after the pandemic. We took into account three key areas:
The environment and climate change Inclusion Work/life balance We are aware that many of our members are also interested in minimizing their impacts on the environment, and we are overdue for an update on meeting our own commitments, so here goes our summary for the year 2023!
Even if you already have an API, ours provides additional benefits as it’s a common, standards-based API that works across all members and records. Researchers having to learn many different member APIs for TDM projects doesn’t scale well.
It is up to you to decide formats for your full-text in: some offer PDF, others XML, and some plain text. Some members vary what they deliver depending on the age of the content or other variables. Our API does not provide automatic access to subscription content - access to subscription content is managed on your site using your existing access control systems.
As a member, you need to do two things to enable text and data mining for the metadata records that you have registered with us:
Include the link to full-text in the metadata for each DOI so researchers can follow it to access your content
Include a license URL in the metadata for each DOI so researchers can use this to find out if they have permission to carry out TDM with your content item
If you are concerned about the impact of automated TDM harvesters on your site performance, you may choose to implement rate-limiting headers.
Rate limiting
TDM may change the volume of traffic that your servers have to handle when researchers download large numbers of files in bulk. You can mitigate performance issues with rate limiting.
We have defined a set of standard HTTPS headers that can be used by servers to convey rate-limiting information to automated text and data mining tools. Well-behaved TDM tools can simply look for these headers when they query member sites in order to understand how to behave so as not to affect the site’s performance. The headers allow a member to define a rate limit window - a time span, such as a minute, an hour, or a day. The member can then specify:
Header name
Example value
Explanation
CR-TDM-Rate-Limit
1500
Maximum number of full-text downloads that are allowed to be performed in the defined rate limit window
CR-TDM-Rate-Limit-Remaining
76
Number of downloads left for the current rate limit window
CR-TDM-Rate-Limit-Reset
1378072800
Remaining time (in UTC epoch seconds) before the rate limit resets and a new rate limit window is started
We do not provide or enforce this rate limiting - it’s up to you to implement it if required, and to define a rate limit appropriate for your servers.
Example member site
We have created TinyPub to show an implementation of our API, including rate limiting and IP-based subscription access. You can download this code for reference, but please note that it’s just to illustrate the workings of the system, and is not intended for production.