At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective.
This yearâs public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.
Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the ârequester paysâ method.
Download the file: The torrent download can be initiated here.
Earlier this year, we reported on the roundtable discussion event that we had organised in Frankfurt on the heels of the Frankfurt Book Fair 2023. This event was the second in the series of roundtable events that we are holding with our community to hear from you how we can all work together to preserve the integrity of the scholarly record - you can read more about insights from these events and about ISR in this series of blogs.
Crossref is undertaking a large program, dubbed 'RCFS' (Resourcing Crossref for Future Sustainability) that will initially tackle five specific issues with our fees. We havenât increased any of our fees in nearly two decades, and while weâre still okay financially and do not have a revenue growth goal, we do have inclusion and simplification goals. This report from Research Consulting helped to narrow down the five priority projects for 2024-2025 around these three core goals:
Test out the early preview of Event Data while we continue to develop it. Share your thoughts. And be warned: we may break a few eggs from time to time!
Chicken by anbileru adaleru from the The Noun Project
Want to discover which research works are being shared, liked and commented on? What about the number of times a scholarly item is referenced? Starting today, you can whet your appetite with an early preview of the forthcoming Crossref Event Data service. We invite you to start exploring the activity of DOIs as they permeate and interact with the world after publication.
But first, a bit of background
Discussion around scholarly research increasingly occurs online after publication, for example on blogs, sharing services, social media, and wikis. These âeventsâ occur across the web on numerous platforms and are a critical part of the scholarly enterprise. We are developing an infrastructure service (Crossref Event Data) that collects, stores, and delivers raw data of the events occurring with Crossref DOIs. We will store the data in an open, auditable and portable form for the community to access. Publishers, platforms, funders, bibliometricians and service providers may benefit from access to this raw data, and it can be used to feed into research records or proprietary tools and services that offer aggregation and analysis.
Developers Martin Fenner (DataCite) and Joe Wass (Crossref) enjoy a tofu break
Lagotto, the software originally developed at PLOS, has been extended and improved in a joint effort between DataCite and Crossref. The two DOI Registration Agencies have partnered to envision, build and release the service. On the 13th of April, after a year ofcollaboration, we jointly released Lagotto 5.0. You can read about the collaboration on the DataCite blog post.
Crossref and DataCite will continue to work closely together to develop Lagotto and the Event Data service. Although Crossref Event Data has mostly Crossref DOIs at launch, you will be able to find DataCite DOIs if they are cited in Crossref or Wikipedia.
All of the software that runs Event Data, including Lagotto, is developed in the open and is open source. Please refer to the Crossref Event Data Technical User Guide for full details.
Preview the data
This service is currently under development with a full launch expected the second half of 2016. Before it is launched however, we invite you to take a look around and preview a subset of the data sources we plan to include. You may experience occasional hiccups while we continue building the service.
At this stage, we are working with data from three sources although we will greatly expand the variety of platforms from which we collect data as development progresses. At this stage, you can view Mendeley bookmarks, Wikipedia references, and Crossref to DataCite links.
Mendeley
Mendeley is a reference manager and academic social network for scholars. View the number of social bookmarks from scholars or groups on Mendeley.
Wikipedia is an online encyclopaedia, the Internetâs largest and most popular general reference work. View references in Wikipedia of Crossref publications in Wikipedia articles in all languages.
DataCite is a global consortium that assigns DOIs to research data. This enables people to find, share, use, and cite data. You can view all the data citations to DataCite research outputs found in Crossref publications (work is underway to make the links found in DataCite metadata available in Event Data).Â
You can explore the Crossref Event Data early preview by visiting http://eventdata.crossref.org and following the links to featured examples within our interim application for inspecting the data, technical documentation, and our Quick Start guide.
Share your thoughts
This service is currently under development and as such we welcome your thoughts and feedback on the data we are collecting currently from our three active sources. As a reminder, we expect to include the following sources as part of our full service launch later this year (pending confirmation):
[table id=1 /]
Â
Weâre also on the lookout for new data sources to investigate for future inclusion in the Event Data service so please do get in touch with requests and recommendations. As we continue to build the service throughout 2016, we will be committing to a model of continuous development so that we can make new sources available as they are completed.
Watch this blog for regular updates on our progress, or subscribe to receive new blog posts by email (just add your details to the upper right side of this page).