Blog

 3 minute read.

PRISM Aggregator Message

Tony Hammond

Tony Hammond – 2009 May 08

In Interoperability

The new OAI-PMH interface to Nature.com sports one particular novelty which may well be of interest here: it makes use of the PRISM Aggregator Message. (For an announcement of this service see the post on our web publishing blog Nascent.)

As a protocol for the harvesting of metadata records within a digital repository, OAI-PMH records may be expressed in a variety of different metadata formats. For reasons of interoperability a base metadata format (‘Dublin Core’) is mandated for all OAI-PMH implementations. The expectation is that this base format would be augmented by community-specific vocabularies.

Our natural inclination was to mirror the article descriptions which we already circulate in our RSS feeds and within our HTML pages (as META tags) and PDF files (as XMP packets). In these cases we have used open data models (e.g. RDF) with simple properties cherry-picked from the DC and PRISM namespaces. But OAI-PMH has a special ‘gotcha’ in this regard: any metadata format must allow for W3C XML Schema validation. That is, the properties need to be constrained by an XSD data model. Enter PRISM Aggregator Message (PAM).

(Continues)

For the longest time I must confess I did not ‘get’ what PAM was about. PRISM was clearly a metadata vocabulary and yet with PAM there was all this wrangling with content, which as an academic publisher we frankly had no interest in as we already had our own journal article DTD and for interop we were beginning to look at NLM DTD. And then it dawned on me (albeit slowly) that the PAM DTD is the equivalent to NLM DTD but for trade magazine publishing, where there might not be such a strong practice of XML. And since the release of PRISM 2.0 (February 2008) there was now also an W3C XML Schema defined for PAM. (Note that the latest revision of PRISM 2.1 is about to be published, although the changes there do not have any bearing on this implementation.)

So, PAM defines PRISM elements to be used with XML content markup. Examining further reveals that within a PAM message there are one or more articles with metadata packaged into a head section, and content (if present) in a body section.

pam-message.png

Section 4.3 in the PAM 2.0 specification lists the allowable head elements by logical grouping, 11 in all: key elements, title, creative origin, publication, publication date, additional article ID, positional, topic, length, related content, rights & usage. Note that not all PRISM elements are supported; in fact only 43 of the 57 PRISM 2.0 elements are supported. Among the missing are ‘prism:endingPage‘. Also only 7 of the 15 DC elements are supported. Nevertheless we found that the bulk of the article descriptions could easily be accommodated within the PAM format. And because this is W3C XML Schema constrained there is an element ordering prescribed, and hence there is an interleaving of DC and PRISM elements.

The Nature.com OAI-PMH service has two access points:

User interface:
http://www.nature.com/oai
Service endpoint:
http://www.nature.com/oai/request

So, to work an example, if we want to get the record for doi:10.1038/nature01234 (which has an OAI-PMH identifier of oai:nature.com:10.1038/nature01234) we could use this call to get the description in PAM format:

http://www.nature.com/oai/request?verb=GetRecord&identifier=10.1038/nature01234&metadataPrefix=pam

(Note that as a convenience for the user we also allow a DOI to be used directly in place of the full OAI-PMH identifier as there is a one-to-one correspondence between the two within our repository. Simplifies cut and paste operations.)

This returns the following properties (shown in document order and by PAM logical grouping):

pam-elements.jpg

With PAM we are thus able to replicate in OAI-PMH the same journal article descriptions that we are currently disseminating through other service/content channels.

Further reading

Page owner: Tony Hammond   |   Last updated 2009-May-08