This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Continuing our blog series highlighting the uses of Crossref metadata, we talked to Ulf Kronman, Bibliometric Analyst at the National Library of Sweden about the work they’re doing, and how they’re using our REST API as part of their workflow.
Introducing the National Library of Sweden (NLS)
The NLS is a state agency, has a staff of about 320, and its main offices in Stockholm. Its primary duty is to preserve the Swedish cultural heritage by collecting everything printed in Sweden, and has been doing so since 1661. Nowadays the library also collects Swedish TV and radio programs, movies, videos, music, and computer games.
The National Library coordinates services and programs for all publicly funded libraries in Sweden and runs the national library catalogue system Libris and the national database for Swedish scholarly output, SwePub. The library also runs the Bibsam consortium, negotiating national subscription licenses and open access publishing agreements with publishers.
Images left to right: External and internal view of the National Library of Sweden, and Ulf Kronman, Bibliometric Analyst at NLS.
What problem is your service trying to solve?
The metadata in the national scholarly publication database SwePub is harvested from the Swedish universities’ local publication systems, where data often is entered manually by librarians and researchers. This means that the metadata can contain a lot of omissions, synonyms, spelling variants and errors. Using Crossref, we can enhance and correct the metadata delivered to us, if we just have a correct DOI.
Can you tell us how you are using Crossref metadata at the National Library of Sweden?
The Crossref metadata is presently used in two projects; Open APC Sweden and in our local analysis database for publication statistics used in negotiations with publishers.
Open APC Sweden is a pilot project to gather data on open access publication costs (APC’s – Article Processing Charges) from Swedish universities. The project is modelled from the German Bielefeld University Open APC initiative, which is a part of the INTACT project. After APC data has been delivered to the APC system, scripts are run against the Crossref API to fetch information about publishers and journals. A description of Open APC Sweden can be found here.
When building our local analysis database for publisher statistics, we download data from the SwePub database, use the Crossref DOIs for API lookup against Crossref to add correct ISSN and publisher data to the records and then match the records against a list of publisher serials. In this way, we can get information about how much Swedish researchers have been publishing with a certain publisher and use this data when negotiating conditions for open access publishing with the publisher in question.
What metadata values do you pull from the API?
In Open APC Sweden, a Python script supplied by staff at the Bielefeld University is used to pull metadata about publisher and journal names and ISSN’s from the Crossref API. The result is entered into an enriched version of the APC data files delivered by the universities and then statistics can be calculated on the result using an R script. The result can be seen here.
In the local analysis database, a modified copy of the Bielefeld Python script is used to add the same metadata to the records before matching them against publisher serial ISSNs.
Have you built your own interface to extract this data?
In Open APC Sweden, the Python script is developed and maintained at the Bielefeld University and an exact copy is being run in the Swedish project.
In the local analysis system, the Python script is somewhat modified to suit the special demands of this system.
But sometimes it is very convenient just to use the main DOI lookup to do a manual check-up of problematic records.
How often do you extract/query data?
In Open APC Sweden, usually about two-three times a month, when new datasets are delivered from the universities. In the local analysis database, usually lookups are being done on a daily basis as development of the database continues.
What do you do with the metadata once it’s pulled from the API?
In Open APC Sweden, the metadata is going into the APC data files for processing of statistics. In the local analysis database, the metadata is used to match against publisher journal ISSN’s.
What plans do you have for the future?
For the Open APC Sweden I would like to build a database system to make the system more scalable than just working with flat data files.
With both the SwePub system and the local analysis system, we are now using the new service oaDOI and their API to look up metadata about the open access status of the publications to enrich our local systems.
What else would you like to see the REST API offer?
In the process of normalising the publishers’ names, the names returned are sometimes at a “too high” or on a too generic level to be used to generate good statistics. For instance, Springer Nature are sometimes returned as Springer Nature, sometimes as Springer Science + Business Media and sometimes as Nature Publishing Group. A similar thing is valid for Taylor & Francis, where the mother company Informa UK Limited is returned instead of the publishing subsidiary of the company. One thing to wish for here is that we could agree on some kind of normalisation of the publishers’ names and that Crossref could return this as a supplement to the present metadata.
Thanks Ulf! If you would like to contribute a case study on the uses of Crossref Metadata APIs please contact the Community team.