Why data citation is important
Data sharing and citation are important for scientific progress. The three key reasons for this are:
- Transparency and reproducibility: Most scientific results that are shared today are just a summary of what researchers did and found. The underlying data are not available, making it difficult to verify and replicate results. If data would always be made available with publications, transparency of research would be greatly improved.
- Reuse: The availability of raw data allows other researchers to reuse the data. Not just for replication purposes, but to answer new research questions.
- Credit: When researchers cite the data they used, this forms the basis for a data credit system. Right now researchers are not really incentivized to share their data, because nobody is looking at data metrics and measuring their impact. Data citation is a first step towards changing that.
Crossref members deposit data & software links by adding them directly into the standard metadata deposit. This is part of the existing Content Registration process. You can add these links to your metadata in one of two ways, via the reference metadata you register with Crossref or via the relationships section of the schema.
References
The main mechanism for depositing data and software citations is to insert them into an article’s reference metadata. To do so, publishers follow the general process for depositing references.
Publishers can deposit the full data or software citation as a unstructured reference, or they can employ any number of reference tags currently accepted by Crossref. Itâs always best to include the DOI (either DataCite or Crossref) for the dataset if possible.
Youâll see additional support for data citations in reference lists in the next version of our schema.
Relationships
We maintain a set of relationship types to support the various content items that a research object, like a journal article, might link to. For data and software, we ask members to provide the following information:
- identifier of the dataset/software
- identifier type: âDOIâ, âAccessionâ, âPURLâ, âARKâ, âURIâ, âOtherâ. Additional identifier types beyond those used for data or software are also accepted, including ARXIV, ECLI, Handle, ISSN, ISBN, PMID, PMCID, and UUID.
- relationship type: âisSupplementedByâ or âreferencesâ (use the former if it was generated as part of the research results).
- description of dataset or software.
Both Crossref and DataCite employ this method of linking. Data repositories who register their content with DataCite follow the same process and apply the same metadata tags. This means that we achieve direct data interoperability with links in the reverse direction (data and software repositories to journal articles).
You can see illustrations and examples of this schema in our Data & Software Citation guide.
How to access data & software citations
Crossref and DataCite make the data & software citations deposited by Crossref members and DataCite data repositories openly available for use for anyone within the research ecosystem (funders, research organisations, technology and service providers, research data frameworks such as Scholix, etc.).
Data & software citations from references can be accessed via the Crossref Event Data API. Citations included directly into the metadata by relation type can be accessed via Crossrefâs APIs. We’re working to include these relation type citations in the Event Data API as well, so that all data citations will be available via one source.
Scholix Participation
The goal of the Scholix (SCHOlarly LInk eXchange) initiative is to establish a high-level interoperability framework for exchanging information about the links between scholarly literature and data. Crossref members can participate by sharing article-data links by including them in their deposited metadata as references and/or relation type as described above. You don’t need to sign up or let us know you’re going to start providing this information, just start to send it to us in your reference lists or in the relationship metadata.
If the reference metadata you are registering with Crossref uses Crossref or DataCite DOIs, the linkage between the publications/data is handled by Crossref - nothing more is needed.
If the data (or other research objects) uses DOIs from another source, or a different type of persistent identifier, then you need to create a relationship type record instead. This method also allows for the linkage of other research objects.
Scholix API Endpoint
The Event Data service implements a Scholix endpoint in the API. A subset of relevant Events (from the ‘crossref’ and ‘datacite’ sources) is available at this endpoint. The filter parameters are the same as specified in the Query API. The response format uses the Scholix schema.
Make Data Count
Crossref participates in the Make Data Count initiative. Make Data Count’s focus is on the widespread adoption of standardized data usage and data citation practices, the building blocks for open research data metrics.
Make Data Count’s goals are three-fold:
- Increased adoption of standardized data usage across repositories through enhanced processing and reporting services
- Increased implementations of proper data citation practices at publishers by working in conjunction with publisher advocacy groups and societies
- Promotion of bibliometrics qualitative and quantitative studies around data usage and citation behaviors
We’re participating to help support and inform data citation work at publishers in conjunction with existing data citation initiatives, so that we can embed data citation into standard publication workflows and give researchers credit for sharing their data.
If you have questions about registering data citations with us, you can consult other users on our forum community.crossref.org or open a ticket with our technical support specialists.