This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
Continuing our blog series highlighting the uses of Crossref metadata, we talked to the team behind new search and discovery tool Dimensions: Daniel Hook, Digital Science CEO; Christian Herzog, ÜberResearch CEO; and Simon Porter, Director of Innovation. They talk about the work they’re doing, the collaborative approach, and how Dimensions uses the Crossref REST API as part of our Metadata Plus service, to augment other data and their workflow.
Introducing Dimensions
Dimensions is a next-generation approach to discovering, connecting with and contextualising research. Modern academics need data about the research ecosystem in which they exist as much as the administrators who develop institutional research strategies. All academics are now required to think long-range about their research projects, contextualise their research, and demonstrate the impact of their program. Additionally, they need to find funding, ensure that students go on to good positions, and hire talented colleagues whose skills fit well with ongoing projects. Dimensions gives the first fully-linked view of publications, grants, patents and clinical trials in an analytically-centred user experience.
How is Crossref data used within Dimensions?
For an article to appear in Dimensions it must have a Crossref DOI, so it would not be possible to create Dimensions’ Publication index without Crossref’s data. Dimensions is built on several principles that we’ve talked about before. Here the most relevant of those principles are:
unique identifiers should underlie everything that we do;
data should not be inclusive and the tool should allow the user to select what they want to see;
data should be more available to our community;
data should be presented with as much contextual information as possible;
the community should have enough data available to be able to create and experiment with their own metrics and indicators.
In the context of these principles, Crossref makes a perfect starting place to create a tool like Dimensions. We use the Crossref data to know about our possible “universe” of articles. We then enhance the Crossref core with data from several different places: open access publications in the DOAJ, PubMed, BioArXiv, and through relationships with publishers. In all, 60 million of the 95 million articles in the Dimensions index have a full text version that we can text and data mine for additional information.
In Dimensions’ enhancement stage we can extract address information (where not included in the original Crossref record) and map it to GRID funding information and the list of funders in Crossref’s Funder Registry as well as to our database of grants in Dimensions.
How have you incorporated citation data?
Access to citations has historically been a thorny issue for citations databases. However, I4OC celebrated its first anniversary in April this year and this project has been a key driver in helping us to build Dimensions with the level of citation coverage that we managed –– it is a fantastic enabling initiative and should be warmly welcomed by the sector. Crossref is not the only source we were able to use to gather citation data; some text mining was needed to get a full graph. Dimensions goes beyond inter-article citations and includes links between patents and publications, links between clinical trials and publications, and Altmetric mentions of publications.
Is Dimensions openly available?
Given that there is so much open data in Dimensions, it was always our intention to give a free version to the community. If you visit http://app.dimensions.ai then you’ll be able to play with the system and use it for your research. While only the publications index is fully open, when you see a link to a grant, patent or clinical trial in an article detail page, you’ll be able to navigate to that record so that you can see the full context of the data.
Beyond the ability to link the publications, Dimensions also displays the CV information which the researcher made visible publicly.
Most recently, we’ve integrated ORCID into Dimensions. This means that you can push data from Dimensions into ORCID if you connect your ORCID account to your Dimensions account.
What are the future plans for Dimensions?
Dimensions is still moving quickly and adding more functionality. Our aim is to release more data facets very soon. We plan to add a Policy Document archive and a Research Data archive. We’ve already found some fascinating insights from joining the existing data together and these two new archives should add even more interesting data.
What else would Dimensions like to see in Crossref metadata?
Open access information is something that we work with Unpaywall to source for Dimensions right now. It would be great if Crossref and Unpaywall could work together to make this data higher quality and more ubiquitous.
Thank you Daniel, Christian and Simon.
If you would like to contribute a case study on the uses of Crossref Metadata APIs please contact the Community team.