This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
As a follow-up to our blog posts on the Crossref REST API we talked to SHARE about the work they’re doing, and how they’re employing the Crossref metadata as a piece of the puzzle. Cynthia Hudson-Vitale from SHARE explains in more detail…
Cynthia Hudson-Vitale, digital data librarian in Research Data and GIS Services at Washington University in St. Louis Libraries and visiting program office for SHARE
SHARE (http://share-research.org) is building a free, open, data set about research and scholarly activities across their life cycle. It is a higher education initiative whose mission is to maximize research impact by making research widely accessible, discoverable, and reusable. SHARE’s data set is free, openly licensed, and built with open source technology developed at the Center for Open Science (COS). Launched in beta in April 2015 the data set has grown to more than 6 million records from 100+ providers, including Crossref, Social Science Research Network (SSRN), DataONE, 50+ library institutional repositories, and more.
How is the Crossref REST API used within SHARE?
SHARE currently harvests metadata from Crossref using the Crossref application programming interface (API). We pull such metadata values as journal title, author, DOI, journal name, and publisher, to name just a few. This metadata is then fed into our data processing pipeline, normalized, and aggregated into the full data set.
What are the future plans for SHARE?
Phase II of SHARE, launched in late 2015, focuses on adding metadata providers, enhancing the metadata, and making connections and links between the metadata records. These links will show the entire life cycle of research and scholarship—connecting a data management plan, grant award information, data deposits, analytic/software code, pre-publications, final manuscripts, and more.
To move these plans forward, SHARE is applying machine-learning and automation techniques and working with the community to verify metadata enhancements and curate the metadata. Current technology work focuses on imputing subject domain keywords and object types into the SHARE data set using learning models and heuristics. Data models and schemas are in development to connect the research lifecycle, connect multiple instances of an object to a single entity, and capture metadata provenance.
What else would SHARE like to see in Crossref metadata?
We would love to see rights-declaration metadata elements and article references/citations included in the metadata about digital objects. The rights-declaration information is invaluable for individuals who want to know what category the object is in (public domain, copyrighted, etc.), what constraints or permission requirements exist, contact information, and more. Additionally, networks of research can be discovered and meta-scholarship facilitated by making article reference lists machine-readable and openly available.
What’s next?
Does this give you any ideas? Feel free to get in touch with questions or take the API for a spinyourself and let us know what you can do with it!