Blog

 4 minute read.

Linking Publications to Data and Software

TL;DR

Crossref and Datacite provide a service to link publications and data. The easiest way for Crossref members to participate in this is to cite data using DataCite DOIs and to include them in the references within the metadata deposit. These data citations are automatically detected. Alternatively and/or additionally, Crossref members can deposit data citations (regardless of identifier) as a relation type in the metadata. Data & software citations from both methods are freely propagated. This blog post also describes how to retrieve the links collected between publication and data & software.


Data & software citation is good research practice (DataCite-STM Joint Statement and FORCE11 Joint Declaration of Data Citation Principles) and is part of the scholarly ecosystem supporting research validation and reproducibility. Data & software citation is also instrumental in enabling the reuse and verification of these research outputs, tracking their impact, and creating a scholarly structure that recognises and rewards those involved in producing them.

Crossref supports the propagation of data & software citations alongside a publisher’s standard bibliographic metadata. members deposit the data citation link as part of the overall publication metadata when registering their content. Crossref partners with DataCite and together, we jointly provide a clearinghouse for the citations collected. These are all made freely available to the community as open data.

Citation practices are evolving across different communities of practice. Crossref’s offering is flexible and easily accommodates variations and changes, since it does not rely on a specific set of citation metadata elements, citation format, nor manner of credit and attribution. Publishers deposit data & software citations in their metadata deposit via a) references and/or b) relation type.

Method A: Bibliographic references

Crossref and DataCite have partnered to provide automatic linking between publications registered with Crossref and datasets bearing DataCite DOIs. This is the most efficient and effective way to ensure that data citations are fully integrated into the scholarly research information network with full and accurate metadata.

All data & software citations that include datasets bearing a DataCite DOI are eligible for auto-update linking with Crossref. In this method: authors cite the dataset or software containing the DataCite DOI per journal article submission guidelines and add it to the article citation list (c.f. FORCE11 citation placement, FORCE11 Software Citation Principles). Publishers then deposit references as part of their standard practice when registering content. Crossref checks every reference deposited for a DOI. If the DOI is identified as DataCite’s, we automatically link it to the article. With this method, no additional action is needed when publishers register their content with Crossref.

Data citation links to non-DataCite DOIs can only be exposed in the references if the publisher makes references openly available. Even in the event that the data citation is shared, it remains undifferentiated from other references. Method B described below offers another approach.

Method B: Relation type

Publishers can link their publication to a variety of associated research objects as part of the article metadata directly in the metadata deposited to Crossref, including data & software, protocols, videos, published peer reviews, preprints, conference papers, etc. Doing so not only groups digital objects together, but formally associates them with the publication. Each link is a relationship and the sum of all these relationships constitutes a ‘research article nexus.’ Data & software citations are a valuable part of this.

To tag the citation in the metadata deposit, we ask for:

  • description of dataset or software (optional)
  • dataset or software identifier
  • identifier type
  • relationship type.
  • Crossref can accommodate research outputs with any identifier, though we currently only validate DOI relationships during metadata processing. Technical details are documented in the [Data & Software Citations Deposit Guide][4].

    Combining methods increases total available citations

    The two methods are independent and can be used exclusively or jointly. Each caters to a different set of conditions and their practical considerations. See the comparison of benefits and limitations for each method in the deposit guide. We recommend that publishers use both methods where possible at this time for optimum specificity and coverage.

    How to access data & software citations

    Crossref and DataCite make the data & software citations deposited by Crossref members and DataCite data repositories openly available to a wide host of parties, including both Crossref and DataCite communities as well as the extended research ecosystem (funders, research organisations, technology and service providers, research data frameworks such as Scholix, etc.).

    Data & software citations from references can be accessed via the Crossref Event Data API  Citations included directly into the metadata by relation type can be accessed via Crossref’s APIs in a number of formats (REST, OAI-­PMH, OpenURL). (A single channel containing data & software citations across interfaces is in development and will be released next year.)

    Publishers, visit our detailed guide on how to deposit data and software citations. We welcome your questions and concerns at feedback@crossref.org.

     

    Special thanks to the following who provided valuable feedback in developing the guide: Martin Fenner (DataCite), Amye Kenall (Springer Nature), Brooks Hanson (AGU), Shelley Stall (AGU), and the FORCE11 Data Citation Implementation Pilot publisher’s subgroup.

    Further reading

    Page owner: Jennifer Lin   |   Last updated 2016-September-07