This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Text and data mining (TDM) is the automatic (bot) analysis and extraction of information from large numbers of documents. TDM is more effective than screen-scraping, which is inefficient, error-prone, and fragile. Screen-scraping puts an unnecessary load on member sites (downloading html, css, javascript and other superfluous web assets), will often break if members (even slightly) redesign their websites, and typically is tied to specific members’ page layouts (and therefore need to be adapted on a member-by-member basis).
Using the DOI as the basis for TDM in a common API provides several benefits:
An easy way to de-duplicate documents that may be found on several sites. Processing the same document on multiple sites could easily skew TDM results and traditional techniques for eliminating duplicates (such as hashes) will not work reliably if the document in question exists in several representations (such as PDF, HTML, ePub) and/or versions (such as author’s accepted manuscript, and version of record)
Persistent provenance information. Using the DOI as a key allows researchers to retrieve and verify the provenance of the items in the TDM corpus, many years into the future when traditional HTTPS URLs will have already broken
An easy way to document, share, and compare corpora without having to exchange the actual documents
A mechanism to ensure the reproducibility of TDM results using the source documents
A mechanism to track the impact of updates, corrections, retractions, and withdrawals on corpora.
Researchers are increasingly interested in performing TDM with scholarly content. This requires automated access to the full-text content of large numbers of articles. The format of the full-text content varies by member. Our metadata helps researchers get access to this content and enables members to provide it.
How TDM works
A member deposits URLs for their full-text and license/waivers (along with other publication metadata) weith us
A researcher finds relevant content registered with us (such as journal articles) using a discovery service
The researcher retrieves metadata for each item of registered content, including license information
The researcher makes a full-text request from the member
The member checks the subscription rights of the researcher and returns the full-text to them.
Researchers and text miners can access content URLs and license information via our API. If you are a member and would like to begin depositing URLs and access indicators, please contact us.