Sponsors make Crossref membership accessible to organizations that would otherwise face barriers to joining us. They also provide support to facilitate participation, which increases the amount and diversity of metadata in the global Research Nexus. This in turn improves discoverability and transparency of scholarship behind the works.
We are looking to work with an individual or organization to perform an audit of, and propose changes to, the structure and information architecture underlying our website, with the aim of making it easier for everyone in our community to navigate the website and find the information they need.
Proposals will be evaluated on a rolling basis. We encourage submissions by May 15, 2025.
At the end of last year, we were excited to announce our renewed commitment to community and the launch of three cross-functional programs to guide and accelerate our work. We introduced this new approach to work towards better cross-team alignment, shared responsibility, improved communication and learning, and make more progress on the things members need.
This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
The ancient Romans performed a purification rite (“lustration”) after taking a census every five years. The term “lustrum” designated not only the animal sacrifice (“suovetaurilia”) but was also applied to the period of time itself. At Crossref, we’re not exactly in the business of sacrificial rituals. But over the weekend I thought it would be fun to dive into the metadata and look at very high level changes during this period of time.
The first thing a census typically asks is population size. We know there are new records arriving each month with 95.7mil to date. And they do so at variable rates. But when the data is visualized, a rough yearly pattern emerges into view. (Data were collected on Mar 25, 2018; results are partial for this month.)
Each year brings with it a significant spike, an influx of new entrants, perhaps reflecting an increase in submissions at the end of the previous year. After January, volume drops down dramatically and gradually rises once more over the course of the year. We see smaller spikes at the March, June, and September mark. (Since this was a brief exercise, I did not dive into any formal research conducted on the nature of publishing cycles.)
Metadata Coverage
The next question is a look at how the population is broken up into different demographics. For this, I analyzed four key sub-populations of ORCID, funding information, license, abstract metadata. The following graph shows the percentage of new parties (i.e., works registered at Crossref containing these metadata) across four specific segments.
The census graph shows extensive empty space on the top half, indicating there is ample room for continual growth in these communities. The ORCID population is expanding the fastest, followed by license and funding. Abstracts are a minority group and quite visibly needs a population boost here in Crossref-land.
This view does not capture the percentages across record types nor does it take into account the differential rate of growth between record types (e.g., journal article, book, report, conference proceeding, dissertation, dataset, component, posted content, peer review) as the Crossref corpus has grown. While ORCID, funding, and license information are available for all full record types (viz., excludes components), this matters for abstracts. Abstracts are part of the metadata schema of all relevant record types. This excludes those which do not apply: dataset, component, and peer reviews. All things considered though, the relative impact on the total percentage of metadata deposited (or not deposited) is miniscule given the small sums for these works.
Calling the real demographers & cartographers
This mini-pseudo-lustrum was the result of a few hours of play. The graphs have raised more questions than answers. We welcome more serious and earnest efforts to dive into the metadata and conduct a more detailed, reliable investigation on the size, distribution and composition of the population through our REST API. Next month, we will roll out reports on metadata coverage based on individual members.
This “play” census came out of a session with Karthik Ram, one of the founders of rOpenSci, as we talked about struggle to build better tools for researchers. (rOpenSci is an exciting and influential non-profit that builds open source software for research with a community of users and developers and educates scientists about transparent research practices.) With each round of cocktails, it became clear that a critical subset of the issues boiled down to the problem of limited information about research publications. Why, that is what Crossref does! Indeed. Publishers register their content with Crossref and provide the metadata about the works they publish.
Over the past few years, we have been working with our members to broaden the coverage of the metadata as well as improve their metadata quality. This issue is not exclusive to Crossref - Metadata 2020 rallies stakeholders across the research enterprise to push for change together.
To represent the full breadth and depth of the scholarly communications enterprise, Crossref aims to capture the richness of what our members publish through the content they register. So publishers, powerfully represent your services and make sure your metadata is complete and correct for discovery systems, indexing platforms, research evaluation systems, analytics tools, and the great number of Crossref metadata consumers far and wide.