This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
This guide gives markup examples for members registering datasets by direct deposit of XML. It is not currently possible to register the datasets record type using one of our helper tools.
Dataset records capture information about one or more database records or collections. Dataset deposits do not contain the entire database record or collection, only descriptive metadata. The metadata can include:
Contributors: the author(s) of a database record or collection
Title: the title of a database record or collection
Date (within <database_date>): the creation date, publication date (if different from the creation date), and the date of last update of the record
Record number or other identifier (within <publisher_item>): the record number of the dataset item. In this context, <publisher_item> can be used for the record number of each item in the database
Description (within <description>): a brief summary description of the contents of the database
Format: the format type of the dataset item if it includes files rather than just text. Note the format element here should not be used to describe the format of items deposited as part of the component_list
Citations (within <citation_list>): a list of items (such as journal articles) cited by the dataset item. For example, dataset entry from a taxonomy might cite the article in which a species was first identified.
The dataset_type attribute should be set to either record or collection to indicate the type of deposit. The default value of this attribute is record.
Constructing dataset deposits
<database> is the container for all information about a set of datasets. The top-level database may be a functional database or an abstraction acting as a collection (much like a journal is a collection of articles). Individual dataset entries are captured within the <dataset> element.
Datasets that aren’t datasets
The database record type is often used to capture metadata for items that do not fit into our currently defined record types. This may include online collections, videos, archives, and other items that aren’t cited or presented as articles, books, reports, or other defined types of content. Learn more about our supported record types.
Example of a database deposit containing several datasets
<doi_batchxmlns="http://www.crossref.org/schema/4.3.7"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"version="4.3.7"xsi:schemaLocation="http://www.crossref.org/schema/4.3.7 http://www.crossref.org/schemas/crossref4.3.7.xsd"><head><doi_batch_id>2006-03-24-21-57-31-10023</doi_batch_id><timestamp>20060324215731</timestamp><depositor><depositor_name>Sample Master</depositor_name><email_address>support@crossref.org</email_address></depositor><registrant>CrossRef</registrant></head><body><database><database_metadatalanguage="en"><titles><title>NURSA Datasets</title></titles><institution><institution_name>Nuclear Receptor Signaling Atlas</institution_name><institution_acronym>NURSA</institution_acronym></institution><doi_data><doi>10.1621/NURSA_dataset_home</doi><resource>http://www.nursa.org/template.cfm?threadId=10222</resource></doi_data></database_metadata><datasetdataset_type="collection"><contributors><person_namecontributor_role="author"sequence="first"><given_name>D</given_name><surname>Mangelsdorf</surname></person_name></contributors><titles><title>Tissue-specific expression patterns of nuclear receptors
</title></titles><doi_data><doi>10.1621/datasets.02001</doi><resource>http://www.nursa.org/template.cfm?threadId=10222&dataType=Q-PCR&dataset=Tissue-specific%20expression%20patterns%20of%20nuclear%20receptors
</resource></doi_data></dataset><datasetdataset_type="collection"><contributors><person_namecontributor_role="author"sequence="first"><given_name>R</given_name><surname>Evans</surname></person_name></contributors><titles><title>Circadian expression patterns of nuclear receptors</title></titles><doi_data><doi>10.1621/datasets.02002</doi><resource>http://www.nursa.org/template.cfm?threadId=10222&dataType=Q-PCR&dataset=Circadian%20expression%20patterns%20of%20nuclear%20receptors
</resource></doi_data></dataset></database></body></doi_batch>
How to access data & software citations
Crossref and DataCite make the data & software citations deposited by Crossref members and DataCite data repositories openly available for use for anyone within the research ecosystem (funders, research organisations, technology and service providers, research data frameworks such as Scholix, etc.).
Data & software citations from references can be accessed via our Event Data API. Citations included directly into the metadata by relation type can be accessed via our APIs. We’re working to include these relation type citations in the Event Data API as well, so that all data citations will be available via one source.
Scholix Participation
The goal of the Scholix (SCHOlarly LInk eXchange) initiative is to establish a high-level interoperability framework for exchanging information about the links between scholarly literature and data. Crossref members can participate by sharing article-data links by including them in their deposited metadata as references and/or relation type as described above. You don’t need to sign up or let us know you’re going to start providing this information, just start to send it to us in your reference lists or in the relationship metadata.
If the reference metadata you are registering with us uses either Crossref or DataCite DOIs, the linkage between the publications/data is handled by us - nothing more is needed.
If the data (or other research objects) uses DOIs from another source, or a different type of persistent identifier, then you need to create a relationship type record instead. This method also allows for the linkage of other research objects.
Scholix API Endpoint
The Event Data service implements a Scholix endpoint in the API. A subset of relevant Events (from the ‘crossref’ and ‘datacite’ sources) is available at this endpoint. The filter parameters are the same as specified in the Query API. The response format uses the Scholix schema.