Citing Data Sets

2 minute read.

Tony Hammond – 2007 March 30

This D-Lib paper by Altman and King looks interesting: “A Proposed Standard for the Scholarly Citation of Quantitative Data”. (And thanks to Herbert Van de Sompel for drawing attention to the paper.) Gist of it (Sect. 3) is

_“We propose that citations to numerical data include, at a minimum, six required components. The first three components are traditional, directly paralleling print documents. … Thus, we add three components using modern technology, each of which is designed to persist even when the technology changes: a unique global identifier, a universal numeric fingerprint, and a bridge service. They are also designed to take advantage of the digital form of quantitative data.

An example of a complete citation, using this minimal version of the proposed standards, is as follows:

**Micah Altman; Karin MacDonald; Michael P. McDonald, 2005, “Computer Use in Redistricting”,

hdl:1902.1/AMXGCNKCLU UNF:3:J0PkMygLPfIyT1E/8xO/EA==

http://id.thedata.org/hdl%3A1902.1%2FAMXGCNKCLU

“_

So the abbreviated citation (author, date, title, unique ID) is supplemented by a UNF which fingerprints the data. UNFs would appear to be a sort of super MD5 in providing a signature of the data content independent of the data serialization to a filestore.

_“Thus, we add as the fifth component a Universal Numeric Fingerprint or UNF. The UNF is a short, fixed-length string of numbers and characters that summarize all the content in the data set, such that a change in any part of the data would produce a completely different UNF. A UNF works by first translating the data into a canonical form with fixed degrees of numerical precision and then applies a cryptographic hash function to produce the short string. The advantage of canonicalization is that UNFs (but not raw hash functions) are format-independent: they keep the same value even if the data set is moved between software programs, file storage systems, compression schemes, operating systems, or hardware platforms.

…

Finally, since most web browsers do not currently recognize global unique identifiers directly (i.e., without typing them into a web form), we add as the sixth and final component of the citation standard a bridge service, which is designed to make this task easier in the medium term.”_

Certainly looks promising. I’m not sure if there’s any other contestants in this arena.

Get involved

Find a service

Documentation

About us

2026 February 03

Innovation in scientific publishing and its implications for Crossref DOI registration practices - MetaROR’s approach

2026 January 28

A spotlight on our community in Indonesia

2026 January 22

Insights from a roundtable on author affiliation metadata

2026 January 14

The GEM program - Year Three and program expansion for 2026

Blog

Citing Data Sets

Further reading

Recent Posts

Categories

Archives