This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
TL;DR The indexing of Similarity Check users’ content into the shared full-text database is about to get a lot faster. Now we need members assistance in helping Turnitin (the company who own and operate the iThenticate plagiarism checking tool) to transition to a new method of indexing content.
For existing Similarity Check users: please check that your metadata includes full-text URLs so that Turnitin can quickly and easily locate and index your content. Full-text URLs need to be included in 90% of journal article metadata by 31st December 2016.
2016 has seen some exciting new developments
(And there are plenty more in store as we strive towards 2017). But first: in April we renamed the service from CrossCheck to Similarity Check and we now have a new service logo available to reference via our logo CDN using the following code.
Earlier this year Crossref also signed a new contract with Turnitin. As part of this, we negotiated the inclusion of dedicated development time each year from Turnitin’s engineering and product teams to focus on developments in the iThenticate tool that will specifically support Similarity Check users and their needs. Many of our members will have been contacted recently by Turnitin and asked to complete a survey regarding how they use the tool and what improvements they would like to see made in the future. The results of this survey are currently being analyzed and will be used by Turnitin to inform a development plan.
Finally, throughout 2016 we have also been working with Turnitin to help them develop a new Content Intake System that provides a faster, more reliable and robust method for collecting data from Crossref and indexing users’ content into the Similarity Check full-text database. Previously Turnitin was only able to collect prefix data from Crossref’s system on a monthly basis whereas today, with the new Content Intake System up and running, they are able to pull full-text content links from deposited metadata on a daily basis. This means that if you are a Similarity Check user currently depositing full-text URLs with Crossref, your content is being indexed by Turnitin faster than ever before.
There are plenty of other benefits this new method provides. This is why we have agreed with Turnitin that from 1st January 2017 onwards, indexing via full-text URLs will be the only method supported for Similarity Check.
Not convinced? Let me share my top four reasons for advocating Turnitin’s exclusive use of the full-text URL indexing method for Similarity Check:
1. Reduced traffic to publisher servers. Indexing via full-text URLs means that the crawl is targeted specifically to the location of the full-text PDF or HTML content, thereby reducing the amount of traffic Turnitin puts through publisher’s servers.
2. Lower margin for error and simplified issue recovery. Turnitin will no longer need to make multiple fetches for any content item, meaning there are now fewer steps in the process. This means there will be fewer places for indexing errors to occur and also reduces the reliance on users setting meta tags or span tags correctly in their markup. Furthermore, if problems do arise, using the one method of indexing for all users will mean that Turnitin is able to pinpoint the issue faster and work with members to resolve it quickly.
3. Quicker turnaround on indexing with fewer delays. Turnitin will no longer need to investigate and set up bespoke indexing methods for different Similarity Check users and they will be able to access the location of full-text content from the one place (ie. within the specific resource tag in member’s metadata deposits). More accurate data from only one location will result in a quicker turnaround on indexing, meaning newly published content will be added into the Similarity Check content database sooner for all members to check other new manuscripts against.
4. Daily ingest is better than monthly! Full-text links can be collected daily from Crossref-rather than monthly for other methods-meaning a more regular ingest of content.
The presence of full-text URLs within the metadata is critical to the functioning of Turnitin’s new indexing system. All new Similarly Check participants are now asked to ensure they have these links in place within their deposited metadata before they participate in the service.
Already a user of Similarity Check?
If you’re an existing Similarity Check participant who joined the service before 2016, your content is likely to be currently indexed via different methods, such as following links contained in your page meta tags. If you’re not currently depositing full-text links with Crossref for Similarity Check, you will have received an email from us about this in August. If you’re unsure though, you can check your XML to see if you have included the full-text link in the field or you can send us an email at similaritycheck@crossref.org as we’d be happy to check for you.
Help, don’t leave me behind!
Us? Never! We’re here to help. But we really do need those full-text links… Everything existing Similarity Check publishers need to know about adding full-text links into new or existing metadata can be found on our help site. These URLs should be included as part of all standard metadata deposits going forward and can be easily added into existing files in bulk. So there’s no need to redeposit the full metadata, unless of course you would prefer to do so!
That’s a wrap
Looking back, it really has been a busy year for Similarity Check and it will continue to be so as we persevere in laying the groundwork for a more streamlined, robust and scalable service for 2017 and beyond. Remember, we need Similarity Check users to ensure they have full-text URLs in at least 90% of their journal article metadata by 31st December 2016 in order to continue using Similarity Check from 2017 onwards.
And please keep us updated! With over 1,200 publishers using Similarity Check, we’ll need a little nudge to know when metadata has been updated to include these links. So once updates have been deposited, please email similaritycheck@crossref.org to confirm. And of course, as always, if there are any questions or if some advice would help, we’re just an email away.