This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the world’s least economically advantaged countries. Eligibility for the program is based on a member’s country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
This is a joint blog post with Dario Taraborelli, coming from WikiCite 2016.
In 2014 we were taking our first steps along the path that would lead us to Crossref Event Data. At this time I started looking into the DOI resolution logs to see if we could get any interesting information out of them. This project, which became Chronograph, showed which domains were driving traffic to Crossref DOIs.
where we knew people were using DOIs but the links are more popular than we realised
By the time the ALM Workshop 2014 rolled around there was some preliminary data and we realised that Wikipedia came into the third category. There are lots of DOIs in Wikipedia and people click them!
I met with Dario Taraborelli, head of research at the Wikimedia Foundation, and shared the data. Dario — who co-authored in 2010 the Altmetrics Manifesto — has been interested in understanding how scholarly citations are used in Wikipedia. Over the years, Wikipedia contributors have made extensive use of references to the scientific literature using DOIs, and by doing so they have created a resource that represents today in many ways the “front matter to all research”. There is growing interest in the community in understanding how DOIs are being used in Wikipedia and in non traditional scholarship.
During our discussions the subject of Wikipedia’s gradual transition to HTTPS was raised: we anticipated that this change would affect our data gathering.
Changes
When you’re reading webpage and click on a link to another page, your web browser will usually tell the server of that second page the last page you were on. This forms the basis of trackers like Google Analytics.
In the days before HTTPS, the next site would know the full URL that you were previously on. With the change to HTTPS, this was reduced to just sending the domain name and not the full URL, or no data at all if you click from an HTTPS page to HTTP.
DOI hyperlinks are just like any other hyperlink, and are mostly HTTP not HTTPS.
Up until 2015, Wikipedia was served over HTTP, only switching to HTTPS when users were logged in or if they requested it. The Wikimedia Foundation started planning to move to HTTPS and we knew that if they did that, and continued to use HTTP DOIs then we would lose valuable research data.
A Plan
We decided that the best course of action was to try and change the DOIs in Wikipedia to use HTTPS. Simple, right?
After some further research, Dario posted a proposal on how to mitigate the impact of the HTTPS rollout, to make sure that Wikipedia can still signal its importance as a traffic source, while preserving the privacy of its users. Discussion followed and the conclusion was to change the format of every single DOI on Wikipedia, which fortunately could be done without having to edit millions of pages. You can read the full story in this post from a year ago.
The result of this effort was that well in advance of the HTTPS switchover, the DOI links were ready to continue reporting referral data.
We held our breath. Would it work? Would we lose all referral data from Wikipedia sites? In February 2016 the last piece of the puzzle fell into place as Wikipedia gained a ‘meta referrer’ tag to explicitly specify how they would like referrers to be sent: a detailed report on the effect of this change is coming up on the Wikimedia Foundation’s blog.
The results
As detailed in the last blog post the traffic that we measured coming from Wikipedia doesn’t seem to have slowed down during 2015:
I’d call that a success! Over the period covered in the graph, Wikipedia remained prominent as a non-publisher referral of traffic to DOIs.
Looking at the balance of HTTP vs HTTPS traffic coming from wikipedia.org, the switchover was dramatic:
Thank you to Dario Taraborelli, Nemo (Federico Leva), Aaron Halfaker, Alex Stinson and everyone who put in this effort.
I’ll leave the last word to Dario:
It’s great to see this data. It shows that the switchover happened successfully, which better protects the privacy of our users whilst still reporting the fact that Wikipedia is a prominent source of traffic. This is important validation of the increasing role that Wikipedia plays in the education and scientific community.