Metadata - Crossref

Add Crossref metadata to PDFs using XMP

Geoffrey Bilder – 2009 December 09

In order to encourage publishers and other content producers to embed metadata into their PDFs, we have released an experimental tool called “pdfmark”, This open source tool allows you to add XMP metadata to a PDF. What’s really cool, is that if you give the tool a Crossref DOI, it will lookup the metadata in Crossref and then apply said metadata to the PDF. More detail can be found on the pdfmark page on the Crossref Labs site. The usual weasels words and excuses about “experiments” apply.

Recommendations on RSS Feeds for Scholarly Publishers

Geoffrey Bilder – 2009 October 19

In InteroperabilityMetadataNews ReleaseRSS

We’re pleased to announce that a Crossref working group has released a set of best practice recommendations for scholarly publishers producing RSS feeds.

Variations in practice amongst publisher feeds can be irritating for end-users, but they can be insurmountable for automated processes. RSS feeds are increasingly being consumed by knowledge discovery and data mining services. In these cases, variations in date formats, the practice of lumping all authors together in one <dc:creator> element, or generating invalid XML can render the RSS feed useless to the service accessing it.

Citation Typing Ontology

Geoffrey Bilder – 2009 March 20

In Citation FormatsDataIdentifiersLinkingMetadata

I was happy to read David Shotton’s recent Learned Publishing article, Semantic Publishing: The Coming Revolution in scientific journal publishing, and see that he and his team have drafted a Citation Typing Ontology.^*

Anybody who has seen me speak at conferences knows that I often like to proselytize about the concept of the “typed link”, a notion that hypertext pioneer, Randy Trigg, discussed extensively in his 1983 Ph.D. thesis.. Basically, Trigg points out something that should be fairly obvious- a citation (i.e. “a link”) is not always a “vote” in favor of the thing being cited.
In fact, there are all sorts of reasons that an author might want to cite something. They might be elaborating on the item cited, they might be critiquing the item cited, they might even be trying to refute the item cited (For an exhaustive and entertaining survey of the use and abuse of citations in the humanities, Anthony Grafton‘s, The Footnote: A Curious History, is a rich source of examples)
Unfortunately, the naive assumption that a citation is tantamount to a vote of confidence has become inshrined in everything from the way in which we measure scholarly reputation, to the way in which we fund universities and the way in which search engines rank their results. The distorting affect of this assumption is profound. If nothing else, it leads to a perverse situation in which people will often discuss books, articles, and blog postings that they disagree with without actually citing the relevant content, just so that they can avoid inadvertently conferring “wuffie” on the item being discussed. This can’t be right.
Having said that, there has been a half-hearted attempt to introduce a gross level of link typology with the introduction of the “nofollow” link attribute- an initiative started by Google in order to try to address the increasing problem of “Spamdexing”. But this is a pretty ham-fisted form of link typing- particularly in the way it is implemented by the Wikipedia where Crossref DOI links to formally published scholarly literature have a “nofollow” attribute attached to them but, inexplicably, items with a PMID are not so hobbled (view the HTML source of this page, for example). Essentially, this means that, the Wikipedia is a black-hole of reputation. That is, it absorbs reputation (through links too the Wikipedia), but it doesn’t let reputation back out again. Hell, I feel dirty for even linking to it here ;-).
Anyway, scholarly publishers should certainly read Shotton’s article because it is full of good, and practical ideas about what can can be done with today’s technology in order to help us move beyond the “digital incunabula” that the industry is currently churning out. The sample semantic article that Shotton’s team created is inspirational and I particularly encourage people to look at the source file for the ontology-enhanced bibliography which reveals just how much more useful metadata can be associated with the humble citation.
And now I wonder whether CiteULike, Connotea, 2Collab or Zotero will consider adding support for the CItation Typing Ontology into their respective services?
* Disclosure:
a) I am on the editorial board of Learned Publishing
b) Crossref has consulted with David Shotton on the subject of semantically enhancing journal articles

Poorboy Metadata Hack

Tony Hammond – 2009 January 06

In Metadata

I was playing around recently and ran across this little metadata hack. At first, I thought somebody was doing something new. But no, nothing so forward apparently. (Heh! 🙂

I was attempting to grab the response headers from an HTTP request on an article page and was using by default the Perl LWP library. For some reason I was getting metadata elements being spewed out as response headers - at least from some of the sites I tested. With some further investigation I tracked this back to LWP itself which parses HTML headers and generates HTTP pseudo-headers using an X-Meta- style header. (This can be viewed either as a feature of LWP or a bug as this article bemoans.)

And the DOI is …

Tony Hammond – 2008 December 22

In Metadata

Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility.

Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip‘ file for Windows (no Perl required) or ‘.dmg‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years. (And when I say actively I mean just that. I once made the mistake of printing out the change file.)

Machine Readable: Are We There Yet?

Tony Hammond – 2008 November 19

In Metadata

The guidelines for Crossref publishers (“DOI Name Information and Guidelines” - [PDF, 210K][1]) has this to say in “Sect. 6.3 The response page” regarding the response page for a DOI:

“A minimal response page must contain a full bibliographic citation displayed to the user. A response page without bibliographic information should never be presented to a user.”

which would seem to be all fine and dandy. But if that user is a machine (or an agent acting for a user) they’ll likely be out of luck as the metadata in the bibliographic citation is generally targeted at human users.

So here’s a quick and dirty implementation of what a machine readable page could look like using RDFa. (The demo uses Jeni Tennison’s wonderful [rdfQuery][2] plugin which I [blogged][3] about earlier.)

Clicking the DOI link below will bring up in a sub-window a bibliographic citation which might be found in a typical DOI repsonse page. If you now click the “Read Me” link you should see an alert message which presents the bibliographic metadata as a complete RDF document (in a simple N3 – or Notation3 – format). This document is assembled on the fly by rdfQuery using the RDFa markup embedded in the page.

See the “View Source” link to list the actual XHTML markup and the RDFa properties which have been added. And note also that some of the properties are partially “hidden” to the human reader, e.g. a publication date is given in year form only whereas the machine record has the date in full, and some of the properties are fully “hidden”: print and electronic ISSNs, issue number, ending page, etc.

(Continues below.)

rdfQuery

Tony Hammond – 2008 November 17

In Metadata

Whaddya know? I was just on the point of blogging about the real nice demo given by Jeni Tennison at last week’s SWIG UK meeting at HP Labs in Bristol of rdfQuery (an RDF plugin for jQuery - the zip file is here). And there today on her blog I see that she has a full writeup on rdfQuery, so I’ll defer to the expert. :~)

All I can really add to that is that rdfQuery is a pretty darn cool way to add and manipulate RDFa using jQuery. Does it get any better?

PRISM 2.1

Tony Hammond – 2008 October 24

In Metadata

Yesterday a new PRISM spec (v2.1) was released for public comment. (Comment period lasts up to Dec. 3, ’08.)

Changes are listed in pages 8 and 9 of the Introduction document. Some highlights:

New PRISM Usage Rights namespace
- Accordingly usage of prism:copyright, prism:embargoDate, and prism:expirationDate no longer recommended
  - New element prism:isbn introduced for book serials
  An updated mod_prism RSS 1.0 module is available which lists all versions of PRISM specs including the forthcoming v2.1 spec. I will see about getting this added now to a more permanent location. Current version of PRISM remains at v2.0. Versions 2.0 and 2.1 are especially of interest to users of Crossref because of their support for prism:doi and prism:url and users should consider upgrading their applications, e.g. RSS feeds.

Metadata Matters

Tony Hammond – 2008 July 21

In Metadata

Andy Powell has published on Slideshare this talk about metadata - see his eFoundations post for notes. It’s 130 slides long and aims

“to cover a broad sweep of history from library cataloguing, thru the Dublin Core, Web search engines, IEEE LOM, the Semantic Web, arXiv, institutional repositories and more.”

Don’t be fooled by the length though. This is a flip through and is a readily accessible overview on the importance of metadata. Slides 86-91 might be of interest here. 😉

PRISM Press Release

Tony Hammond – 2008 July 09

In Metadata

The PRISM metadata standards group issued a press release yesterday which covered three points:

PRISM Cookbook

The Cookbook provides “a set of practical implementation steps for a chosen set of use cases and provides insights into more sophisticated PRISM capabilities. While PRISM has 3 profiles, the cookbook only addresses the most commonly used profile #1, the well-formed XML profile. All recipes begin with a basic description of the business purpose it fulfills, followed by ingredients (typically a set of PRISM metadata fields or elements), and, closes with a step-by-step implementation method with sample XMLs and illustrative images.”

RSS Feed

Get involved

Find a service

Documentation

About us

2026 March 19

On metadata enrichment

2026 March 17

2026 public data file now available

2026 March 16

Reflections from the Crossref Ambassador Community

2026 March 12

Renewed partnership: DOAJ and Crossref focus on equitable scholarly metadata and global support

Blog

Add Crossref metadata to PDFs using XMP

Recommendations on RSS Feeds for Scholarly Publishers

Citation Typing Ontology

Poorboy Metadata Hack

And the DOI is …

Machine Readable: Are We There Yet?

rdfQuery

PRISM 2.1

Metadata Matters

PRISM Press Release

Recent Posts

Categories

Archives