Documentation

Deposit harvester

The deposit harvester allows you to retrieve metadata records for content that you’ve registered. The metadata retrieved is in our UNIXSD output format, which delivers the exact metadata submitted in a deposit, including any citations registered. Members (or their designated third parties) may only retrieve their own metadata.

The harvester uses Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to deliver the metadata. The verbs Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord are supported.

Ownership and retrieval restrictions - who can retrieve records?

The deposit harvester will only retrieve records for the authorized owner of the metadata records. Metadata ownership is established by the DOI prefix(es) associated with a user’s account (learn more about transferring responsibility for DOIs. Many members have one prefix and one account, but some members may have multiple prefixes. For example, Member A has been assigned account abcd, which is associated with prefixes 10.xxxx, 10.yyyy, and 10.zzzz. Member A can retrieve metadata owned by prefixes 10.xxxx, 10.yyyy, and 10.zzzz using their abcd account.

Ownership of DOIs and titles often moves from member to member, so a title-owning prefix will not always match the prefix of the DOIs attached to the title. Retrieval permission is granted to the current owner, not the original depositor. For example, Member B registers identifier 10.5555/jfo.33425. Ownership of the journal and all identifiers is transferred to Member A with prefix 10.50505. The DOI is now “owned” by prefix 10.50505, and only Member A may harvest the metadata record for that identifier.

Sets

The deposit harvester supports a hierarchy of sets. The hierarchy is in three parts: <work-type>:<prefix>:<publication-id>. For example, the set J:10.12345:6789 will return metadata for a journal (J), with prefix 10.12345, and publication id 6789. The set B will return all book metadata. The set S:10.12345 will return all the series metadata associated with the 10.12345 prefix.

The work-type designators are:

  • J for journals
  • B for books and book-like works (reports, conference proceedings, standards, dissertations)
  • S for non-journal series and series-like works.

If no set is specified, the set “J” is used.

Example requests

ListSets

Retrieve list of titles owned by the prefixes assigned to your account:

https://oai.crossref.org/DepositHarvester?verb=ListSets&usr=username&pwd=password

ListRecords

Retrieve data for a prefix:

https://oai.crossref.org/DepositHarvester?verb=ListRecords&metadataPrefix=cr_unixsd&set=work-type:prefix&usr=username&pwd=password

Retrieve data for a single title:

https://oai.crossref.org/DepositHarvester?verb=ListRecords&metadataPrefix=cr_unixsd&set=work-type:prefix:title ID&usr=username&pwd=password

GetRecord

Retrieve data for a single DOI:

https://oai.crossref.org/DepositHarvester?verb=GetRecord&metadataPrefix=cr_unixsd&identifier=info:doi/DOI&usr=username&pwd=password

When using GetRecord, the <DOI> value should be URL encoded.

Identify

Use to check the status of the deposit harvester (no account needed):

https://oai.crossref.org/DepositHarvester?verb=Identify

ListMetadataFormats

Lists available metadata formats (currently UNIXREF)

https://oai.crossref.org/DepositHarvester?verb=ListMetadataFormats

Request parameters

  • work-type: J for journals, B for book or conference proceeding titles, S for series
  • prefix: the owning prefix of the title being retrieved
  • title ID: the title identification number assigned by us. Title IDs are included in the ListSets response described above.
  • username and password: account details for the prefix/title being retrieved

Results

Results conform to Crossref’s UNIXREF format and may contain the following root elements:

  • journal
  • book
  • conference
  • dissertation
  • report-paper
  • standard
  • sa_component
  • database

Using resumption tokens with the deposit harvester

Some OAI-PMH requests are too big to be retrieved in a single transaction. If a given response contains a resumption token, the user must make an additional request to retrieve the rest of the data. You must provide the account name and password with both the initial request and subsequent resumption requests. A resumption without authentication details will fail. Learn more about resumption tokens.

Initial request

https://oai.crossref.org/DepositHarvester?verb=ListRecords&metadataPrefix=cr_unixsd&set=J:10.4102:83986&usr=username&pwd=password

Request with resumption token

https://oai.crossref.org/DepositHarvester?verb=ListRecords&metadataPrefix=cr_unixsd&set=J:10.4102:83986&usr=username&pwd=password&resumptionToken=01f7f30e-f692-4cc4-97b2-1eaf88b3f17f

Page owner: Martyn Rittman   |   Last updated 2020-April-08