At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective.
This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.
Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.
Download the file: The torrent download can be initiated here.
Earlier this year, we reported on the roundtable discussion event that we had organised in Frankfurt on the heels of the Frankfurt Book Fair 2023. This event was the second in the series of roundtable events that we are holding with our community to hear from you how we can all work together to preserve the integrity of the scholarly record - you can read more about insights from these events and about ISR in this series of blogs.
Crossref is undertaking a large program, dubbed 'RCFS' (Resourcing Crossref for Future Sustainability) that will initially tackle five specific issues with our fees. We haven’t increased any of our fees in nearly two decades, and while we’re still okay financially and do not have a revenue growth goal, we do have inclusion and simplification goals. This report from Research Consulting helped to narrow down the five priority projects for 2024-2025 around these three core goals:
Crossref’s Similarity Check service is used by our members to detect text overlap with previously published work that may indicate plagiarism of scholarly or professional works. Manuscripts can be checked against millions of publications from other participating Crossref members and general web content using the iThenticate text comparison software from Turnitin.
The 2000 members who already make use of Similarity Check upload almost 2,000,000 documents each month to look for matching text in other publications.
We have some great news for those 2000 members –– a completely new version of iThenticate is on its way, and will start to roll out to users in the coming months.
New functionality has been developed based on your feedback over the past few years and includes:
An improved Document Viewer that makes PDFs searchable and accessible, with responsive design for ease of use on different screen sizes. All of the functionality of the Viewer and the Text-only reports in the previous version have been streamlined into just two views: Sources Overview and All Sources.
Improved exclusion options to make refining matches even easier. Smarter citation detection now identifies probable citations both inline and in reference sections.
A new “Content Portal” where you can see what percentage of your own content has been successfully indexed for the iThenticate comparison database, and download reports of indexing errors that need to be fixed.
A new API for integration with manuscript submission systems allows display of the largest matching word count and the top 5 source matches alongside the Similarity Score.
The maximum number of pages and file size per document has been doubled to 800 pages/200 MB.
Crossref members can use Similarity Check directly by logging in, or via an integration with a submission/peer review system. We are working with many system providers to bring v2.0 to you as soon as possible. In the meantime, we are looking for members to help us test the new system directly in the iThenticate user interface. If you are interested and can spare a few hours some time in the next month please let me know.
And if your organization is not yet using Similarity Check to assess the originality of the manuscripts you receive do take a look at the many benefits the service has to offer.