2 minute read.And the DOI is …
Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility.
Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip
‘ file for Windows (no Perl required) or ‘.dmg
‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years. (And when I say actively I mean just that. I once made the mistake of printing out the change file.)
If Perl’s not your thing, then there’s a Ruby wrapper gem (MiniExiftool) to access the Exiftool command in trouper OO fashion. Here’s an example Ruby one-liner to get the DOI from a PDF (broken here to meet column width restriction):
% ruby -rubygems -e 'require "mini_exiftool";<br /> puts MiniExiftool.new("test.pdf")["doi"]'<br /> 10.1038/nphoton.2008.200
Of course, that could also have been run against an image, audio or video file with XMP packet.
(Makes one wonder vaguely about the feasibility of having a Swiss Army knife type of utility that could read any file to get the DOI using the embedded XMP, RDFa, RDF, HTML headers, COiNS, etc. Possibly even as last resort fall back to scanning the raw text - if any.)
Further reading
- Jul 1, 2024 – Celebrating five years of Grant IDs: where are we with the Crossref Grant Linking System?
- Jun 27, 2024 – The anatomy of metadata matching
- May 16, 2024 – Metadata matching 101: what is it and why do we need it?
- May 14, 2024 – 2024 public data file now available, featuring new experimental formats
- Apr 24, 2024 – Common views and questions about metadata across Africa
- Mar 13, 2024 – Subject codes, incomplete and unreliable, have got to go
- Jan 30, 2024 – RORing ahead: using ROR in place of the Open Funder Registry
- Jan 19, 2024 – Increasing Crossref Data Reusability With Format Experiments