This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
As well as providing persistent links to scholarly content, we also provide community infrastructure by linking publications to associated content, making research easy to find, cite, link, and assess. Data citations are a core part of this service, linking publications to their supporting data, making both the research itself and the research process more transparent and reproducible.
Data citations are references to data, just as bibliographic citations make reference to other scholarly sources.
Members deposit data citations by including them in their metadata as references and/or relationship types. Once deposited, data citations across journals (and publishers) are then aggregated and made freely available for the community to retrieve and reuse in a single, shared location.
There are two ways for members to deposit data citation links:
Bibliographic references: The main mechanism for depositing data and software citations is to insert them into an article’s reference metadata. Data citations are included in the deposit of bibliographic references for each publication. Follow the general process for depositing references and apply tags as applicable.
Relationship type: data links are asserted in the relationship section of the metadata deposit, where they connect the publication to a variety of associated online resources (such as data and software, supporting information, protocols, videos, published peer reviews, preprint, conference papers) in a structured way, making discovery more powerful and accurate. Here, publishers can identify data which are direct outputs of the research results if this is known. This level of specificity is optional, but can support scientific validation and research funding management.
The two methods are independent, and can be used individually or together.
Method
Benefits
Limitations
Bibliographic references
<ul><li>Data and software citation is automatically deposited when included with publisher’s reference deposit</li></ul>
<ul><li>Limited to datasets with DataCite DOIs. Others cannot be identified and validated from references deposit</li><li>Noise: not all DataCite DOIs linked are datasets/software (they could be other record types such as articles, slides, preprints)</li></ul>
Relation type
<ul><li>Precise identification of data, differentiated from other content</li><li>Dataset differentiation between those generated as part of research results from those cited by the research</li></ul>
<ul><li>None</li></ul>
Sending this metadata to Crossref makes it easier for the research community to see links between different research outputs and work with these outputs. It also makes it easier to see these citations, so that researchers can get credit for their data and the sharing of that data.
We collect these citations, and make them freely available via our APIs in multiple interfaces (REST, OAI-ÂPMH, OpenURL) and formats (XML, JSON). Data is made openly available to a wide range of organizations and individuals across the extended research ecosystem including funders, research organisations, technology and service providers, indexers, and many others.
Bibliographic references`
Dataset
Snippet of deposit XML containing link
Dataset or software generated as part of research article:Data from: Extreme genetic structure in a social bird species despite high dispersal capacity.Database: Dryad Digital Repository``DOI:https://doi.org/10.5061/dryad.684v0
<program xmlns="http://www.crossref.org/relations.xsd">`<related_item>`<description>Data from: Extreme genetic structure in a social bird species despite high dispersal capacity</description>`<inter_work_relation relationship-type="isSupplementedBy" identifier-type="doi">10.5061/dryad.684v0</inter_work_relation>`</related_item> `` </program>
Associated dataset or software:NKX2-5 mutations causative for congenital heart disease retain functionality and are directed to hundreds of targetsDatabase: Gene Expression Omnibus (GEO) **Accession number:** GSE44902URL:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44902
<program xmlns="http://www.crossref.org/relations.xsd">`<related_item>`<description>NKX2-5 mutations causative for congenital heart disease retain and are directed to hundreds of targets</description><inter_work_relation relationship-type="references" identifier-type="Accession">GSE44902</inter_work_relation> `` </related_item></program>