This year, metadata development is one of our key priorities and weâre making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; weâve added a âtypeâ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, weâre delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. Itâs a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
We began our Global Equitable Membership (GEM) Program to provide greater membership equitability and accessibility to organizations in the worldâs least economically advantaged countries. Eligibility for the program is based on a memberâs country; our list of countries is predominantly based on the International Development Association (IDA). Eligible members pay no membership or content registration fees. The list undergoes periodic reviews, as countries may be added or removed over time as economic situations change.
February 5, 2007, Washington DC Crossref invited a number of people to attend an information gathering session on the topic of Author IDs. The purpose of the meeting was to determine:
About whether there is an industry need for a central or federated contributor id registry;
whether Crossref should have a role in creating such a registry;
how to proceed in a way that builds upon existing systems and standards.
In attendance: Jeff Baer, CSA; Judith Barnsby, IOPP; Geoff Bilder, Crossref; Amy Brand, Crossref; David Brown, British Library; Richard Cave, PLoS (remote); Bill Carden, ScholarOne; Gregg Gordon, SSRN; Gerry Grenier, IEEE; Michael Healy, BISG (remote); Helen Henderson, Ringgold; Thomas Hickey, OCLC (remote); Terry Hulburt, IOPP; Tim Ingoldsby, AIP; Ruth Jones, Britsh Library; Marl Land, Parity; Dave Martinson, ACS; Georgios Papadapoulos, Atypon (with two colleagues); Jim Pringle, Thomson; Chris Rosin, Parity; Tim Ryan, Wiley; Philippa Scoones, Blackwell; Chris Shillum, Elsevier; Neil Smalheiser, UIC (remote); Barbara Tillett, LoC; Vetle Torvik, UIC (remote); Charles Trowbridge, ACS; Amanda Ward, Nature (remote); Stu Weibel, OCLC (remote); David Williamson, LoC;
Notes Amy Brand opened the meeting and welcomed attendees. She said the goal of the meeting was really nothing more than to launch a discussion on a topic of author identifiers and hear from participants re their views and experiences on unique identifiers for individuals â be they authors, contributors, or otherwise. We went around the table and everyone introduced themselves. Amy then introduced Geoff Bilder as moderator of the meeting. Geoffrey Bilder said that Crossrefâs members had indicated that they would like Crossref to explore whether it could play a role in creating an author identification system. The members feel that an âauthor DOIâ scheme would help them with production and editorial issues. They also recognize that such a scheme could fuel numerous downstream applications. Geoff apologized for sounding like Rumsfeld and said, we know that there is a lot that we donât know, but we donât know exactly what we donât know. We have just started this project and we wanted to get some feedback from various groups concerned with scholarly publishing in order to understand what people would like to see in regards to author identification schemes and what initiatives/efforts we need to be aware of. He commented that the currently assembled group failed to include the open web community, and their input would be important too as this project develops. The meeting then turned to short project summaries from others.
Project Summaries Jim Pringle gave a short PPT presentation (attached) and reported that Thomson first started creating its own author ids in 2000, in relation to the launch of its Highly Cited service. The focus for Thomson in this area has been on author disambiguation. Jim said that the focus for Crossref in this area would be a system that could respond to the question âwho are you and what have you writtenâ; he also raised concern about matters of author privacy.
Helen Henderson reported on the Journals Supply Chain project, a pilot that aims to discover whether the creation of a standard, commonly used identifier for Institutions (customer ids) will be beneficial to parties involved in the journal supply chain. The pilot models interactions between each party â library, publisher, agent. 35 publishers are participating thus far. Helen also said there is a clear need for sub-institutional level ids. Helen also pointed out the value of associating author and institutional ids. On the topic of institutions, Tim Ingoldsby pointed out that both academic and corporate institutions are important. Chris Rosin said Parity is working on author merger and disambiguation as core use cases of author ids for its publisher clients. In particular, they have developed automated merging of instances into profiles, proceeding with conservative bias on what constitutes a match/merge. Parity is also looking at applying author cvâs onto profiles. This will require contributors to participate, and they will need to make it as easy as possible for contributors. Chris said that authentication, trust, and privacy are key considerations; even collecting public information in one place raises privacy issues. Judith Barnsby pointed out that the UK has stronger data protection rules than the US, re privacy. Discussion among the group at this point in the meeting resulted in identifying two different areas in author id assignment â (1) ongoing assignment, (2) retroactive assignment. Geoff said this distinction was useful for Crossref, who could more easily address ongoing assignment via publishers working directly with authors.
Neil Smalheiser, a neuroscientist at UIC, reported on the Arrowsmith Project, a statistical model based on multiple features of the Medline database. The goal of the model is to predict the probability that any two papers are written by the same person. The projectâs âAuthorityâ tool weighs criteria such as researcher affiliation, co-author names, journal title, and medical subject headings to identify the papers most likely written by a target author. For details: arrowsmith.psych.uic.edu/arrowsmith_uic/index.html http://arrowsmith.psych.uic.edu/arrowsmith_uic/index.html
David Williamson of LoC said he was working on name authority files, using ONIX metadata. Barbara Tillet of LoC spoke about authority files and related efforts in library world, which uses the control number, one type of unique id. She reported that IFLA (International Federation of Library Associations) has a group working on how to share authority numbers, which has actually been in discussion since the 1970s; there is to be an IFLA-IPA meeting in April 2007. The library community is eager to share what it knows and what it has developed this far. Barbara suggested that use of Dublin Core format here may be the best way to go. Different communities will no doubt need different ids. What is needed in the library community is an international, multi-lingual solution, based on unicode, connecting regional authority files. Publishers will want to take advantage of library author-ity files for retrospective identifications.
Thomas Hickey of OCLC mentioned the WorldCat Identity service, which summarizes information for 20 million authors searchable in WorldCat. Gerry Grenier reported that IEEE was about to implement its own author disambiguation and id system, and he offered that this metadata could be fed into a Crossref system. Different participants had different views on whether the goal here should be a âlight and non-centralizedâ (or federated) approach versus a centralized registry with one place to link authors across all publishers, versus a hybrid â centralized source to handout unique id, but publisher data could be distributed. There could also be a network of registration agencies working in a federated system. Different participants also had different views on Crossrefâs role. Several publishers at the meeting supported Crossrefâs role, especially in the STM space, whereas there was concern raised among some parties about whether Crossref was an appropriate choice for a system that will need to be âavailable everywhere to everybodyâ, and others re-iterated the importance of giving the academic community a voice in the development of such a service Discussion then turned to use cases â the question being, what problems would having an author id help you solve in your organization?
USE CASES ARTICULATED AT MEETING:
For RROs, known use case is to facilitate distribution of monies owed to authors;;
for booksellers, disambiguation in search;;
to understand the provenance of documents;
search â to find works for particular person; self presentation â how can I effectively present myself and my work to the world?;
cross-walks â associating various life sciences ids, such as PubChem;
identity of society members;
identity of research funding institutions;
disambiguation and attribution;
linking authors and institutions;
for enhancing peer review system â need unique ids to share information with various departments;
to better know the value of our authors â for activities such as peer review, tracking stats on authors, article downloads, and individualized or personalized services;
with a central registry, author only has one place they have to update their information;
authors will want the information to be portable when they move from inst to another â âwhere is Jeff Smith now?â is one such question;
to associate connected authors with one another;
to aggregate info on where (what institution) research is being done on a particular topic;
privacy can be enhanced with author DOIs;
sharing info from library to library;
cluster all the works of a particular person for search purposes;
stats about authors â âhow many times has this author tried and been rejected from Nature?â for instance.
**NEXT STEPS: Please watch the CrossTech blog for ongoing discussion **