This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Research can be modified after publication, including being corrected or retracted. This is a natural part of the research process and important for accurately reporting changes. While members can deliver this information to us, Retraction Watch has also collected a large number of retractions. Many of these have not been reported by our members.
In September 2023, we acquired the Retraction Watch database from the Center of Scientific Integrity and have made it publicly available. The database contains retractions gathered from publisher websites and is updated every working day by Retraction Watch. Some other update types, such as expressions of concern and corrections, are also included in the data, but these are not as comprehensive as retractions. Various methods are used to find retractions, including searching scholarly databases, checking publisher websites, web searches, and reports from the community. For further details, see this document.
Accessing the Retraction Watch Database
There are two ways to access the Retraction Watch data, either via the Crossref REST API or downloading the full dataset.
REST API
Retractions are included in the update-to field of json files in the REST API. Retractions and other updates from Retraction Watch are identified by a source field, which can have a value of publisher or retraction-watch. The following query provides a list of 100 retractions:
The Retraction Watch database is available in csv format from a git repository. It is updated once per working day. Git is a widely used for sharing software code and can also be used for datasets.
To create a local copy of the Retraction Watch metadata file, install git and use the command git clone https://gitlab.com/crossref/retraction-watch-data. This creates a folder called retraction-watch-data. When you want to update to the most recent version, run the command git pull from this folder.
Data in the csv file is comma-separated, with lists within a single entry separated by a semicolon (such as author names or reasons for retraction). The column headings in the csv file are:
Record ID: An internal identifier from Retraction Watch.
Title: The title of the retracted or updated content.
Subject: The subject area of the publication.
Institution: Author affiliations, as given in the content.
Journal: The source (serial, book, etc.) in which the research was published.
Publisher: The organisation responsible for publication.
Country: Countries included in author affiliations.
Author: A list of author names.
URLS: Links to relevant pages on the Retraction Watch website, including blog posts about the retraction.
ArticleType: The content type, using a list of types maintained by Retraction Watch. Note that this isn’t the same as the Crossref work type.
RetractionDate: The date of the published retraction.
RetractionDOI: The DOI of the published retraction, if available. If there is no DOI, the value is either blank, ‘unavailable’, or ‘Unavailable’.
RetractionPubMedID: PubMED ID of the published retraction, if available. If there is no Pubmed ID, the value is either blank or 0.
OriginalPaperDate: The publication date of the retracted content.
OriginalPaperDOI: The DOI of the retracted publication, if available. If there is no DOI, the value is either blank, ‘unavailable’, or ‘Unavailable’.
OriginalPaperPubMedID: PubMED ID of the original publication, if available. If there is no Pubmed ID, the value is either blank or 0.
RetractionNature: The type of update notice, which can be Retraction, Correction, Expression of concern, or Reinstatement. Note that these are different to the list of update types in the Crossref schema.
Reason: A list of reasons for retraction. This uses a controlled vocabulary maintained by Retraction Watch.
Paywalled: Is a fee or paid subscription required to access the retraction notice? Note that there can be cases where this changes some time after publication of the notice.