In the first half of this year we’ve been talking to our community about post-publication changes and Crossmark. When a piece of research is published it isn’t the end of the journey—it is read, reused, and sometimes modified. That’s why we run Crossmark, as a way to provide notifications of important changes to research made after publication. Readers can see if the resesarch they are looking at has updates by clicking the Crossmark logo.
We’re happy to note that this month, we are marking five years since Crossref launched its Grant Linking System. The Grant Linking System (GLS) started life as a joint community effort to create ‘grant identifiers’ and support the needs of funders in the scholarly communications infrastructure.
The system includes a funder-designed metadata schema and a unique link for each award which enables connections with millions of research outputs, better reporting on the research and outcomes of funding, and a contribution to open science infrastructure.
In our previous blog post about metadata matching, we discussed what it is and why we need it (tl;dr: to discover more relationships within the scholarly record). Here, we will describe some basic matching-related terminology and the components of a matching process. We will also pose some typical product questions to consider when developing or integrating matching solutions.
Basic terminology Metadata matching is a high-level concept, with many different problems falling into this category.
Update 2024-07-01: This post is based on an interview with Euan Adie, founder and director of Overton._
What is Overton? Overton is a big database of government policy documents, also including sources like intergovernmental organizations, think tanks, and big NGOs and in general anyone who’s trying to influence a government policy maker. What we’re interested in is basically, taking all the good parts of the scholarly record and applying some of that to the policy world.
Funding metadata must include the name of the funding organization and the funder identifier (where the funding organization is listed in the Registry), and should include an award/grant number or grant identifier. Funder names should only be deposited without the accompanying ID if the funder is not found in the Registry. While members can deposit the funder name without the identifier, those records will not be considered valid until such a time as the funder is added to the database and they are redeposited (updated) with an ID. What that means is that they will not be found using the filters on funding information that we support via our REST API, or show up in our Open Funder Registry search.
Correct nesting of funder names and identifiers is essential as it significantly impacts how funders, funder identifiers, and award numbers are related to each other.
Correct: In this example, funder “National Science Foundation” is associated with the funder identifier https://doi.org/10.13039/100000001
<fr:assertion name="funder_name">National Science Foundation
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100000001</fr:assertion>
</fr:assertion>
Incorrect: Here, the funder name and funder identifier are not nested - these assertions will be indexed as separate funders.
The purpose of funder groups is to establish relationships between funders and award numbers. A funder group assertion should only be used to associate funder names and identifiers with award numbers when multiple funders are present.
Funding data deposit with one group of funders (no “fundgroup” needed):
Funding data deposit with two fundgroups:
Incorrect: Groups used to associate funder names with funder identifiers, these need to be nested as described above.
Deposits using a funder_identifier that is not taken from the Open Funder Registry will be rejected.
Deposits with only funder_name (no funder_identifier) will not appear in funder search results in Open Funder Registry search or the REST API.
The <fr:program> element in the deposit schema section (see documentation) supports the import of the fundref.xsd schema (see documentation). The fundref namespace (xmlns:fr=https://www.crossref.org/fundref.xsd) must be included in the schema declaration, for example:
To accommodate integration with Crossmark, the fundref.xsd consists of a series of nested <fr:assertion> tags with enumerated name attributes. The name attributes are:
fundgroup: used to group a funder and its associated award number(s) for items with multiple funders.
funder_name: name of the funding agency as it appears in the funding Registry. Funder names that do not match those in the registry will be accepted to cover instances where the funding organization is not listed.
funder_identifier: funding agency identifier in the form of a DOI, must be nested within the funder_name assertion. The funder_identifier must be taken from the funding Registry and cannot be created by the member. Deposits without funder_identifier do not qualify as funding records.
award_number: grant number or other fund identifier
funder_nameandfunder_identifier must be present in a deposit where the funding body is listed in the Open Funder Registry. Multiple funder_name, funder_identifier, and award_number assertions may be included.
A relationship between funder_identifier and funder_name is established by nesting funder_identifier within funder_name. For example, this deposit has the funder National Science Foundation with its corresponding funder identifier in the Open Funder Registry of https://doi.org/10.13039/100000001 :
<fr:assertion name="funder_name">National Science Foundation
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100000001</fr:assertion>
</fr:assertion>
A relationship between a single funder_name and/or funder_identifier and an award_number is established by including assertions with a <fr:program>. In this example, funder National Institute on Drug Abuse with funder identifier https://doi.org/10.13039/100000026 are associated with award number JQY0937263:
<fr:program name="fundref">
<fr:assertion name="funder_name">National Institute on Drug Abuse
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100000026</fr:assertion>
</fr:assertion>
<fr:assertion name="award_number">JQY0937263</fr:assertion>
</fr:program>
If multiple funder and award combinations exist, each combination should be deposited within a fundgroup to ensure that the award number is associated with the appropriate funder(s). In this example, two funding groups exist:
Funder National Science Foundation with funder identifier https://doi.org/10.13039/100000001 is associated with award numbers CBET-106 and CBET-106, and
Funder Basic Energy Sciences, Office of Science, U.S. Department of Energy with funder identifier https://doi.org/10.13039/100006151 is associated with award number 1245-ABDS.
<fr:program name="fundref">
<fr:assertion name="fundgroup">
<fr:assertion name="funder_name">National Science Foundation
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100000001</fr:assertion>
</fr:assertion>
<fr:assertion name="award_number">CBET-106</fr:assertion>
<fr:assertion name="award_number">CBET-7259</fr:assertion>
</fr:assertion>
<fr:assertion name="fundgroup">
<fr:assertion name="funder_name">Basic Energy Sciences, Office of Science, U.S. Department of Energy
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100006151</fr:assertion>
</fr:assertion>
<fr:assertion name="award_number">1245-ABDS</fr:assertion>
</fr:assertion>
</fr:program>
Items with multiple funder names but no award numbers may be deposited without a fundgroup.
At a minimum, a funding data deposit must contain a funder_name and funder_identifier assertion. Deposits with just an award_number assertion are not allowed. A funder_name, funder_identifier, and award_number should be included in deposits whenever possible.
If the funder name cannot be matched in the Registry, you may submit funder_name only, and the funding body will be reviewed and considered for addition to the official Registry. Until it is added to the Registry, the deposit will not be considered a valid funding record and will not appear in funding search or the REST API.
As demonstrated in Example 3 below, items with several award numbers associated with a single funding organization should be grouped together by enclosing the funder_name, funder_identifier, and award_number(s) within a fundgroup assertion.
Some rules will be enforced by the deposit logic, including:
Nesting of the<fr:assertion>elements: the schema allows infinite nesting of the assertion element to accommodate nesting of an element within itself. Deposit code will only allow 3 levels of nesting (with attribute values of fundgroup, funder_name, and funder_identifier)
Values of different<fr:assertion>elements: funder_name, funder_identifier, and award_number may have deposit rules imposed
Only valid funder identifiers will be accepted: the funder_identifier value will be compared against the Open Funder Registry file. If the funder_identifier is not found, the deposit will be rejected.
If funding metadata is incorrect or out-of-date, it may be updated by redepositing the metadata. Be sure to redeposit all available metadata for an item, not just the elements being updated. A DOI may be updated without resubmitting funding metadata, as previously deposited funding metadata will remain associated with the DOI.
Funding metadata may be deleted by redepositing an item with an empty <fr:program name="fundref"> element:
Submitting an empty Crossmark tag (<crossmark />) will delete all Crossmark data, including funding data. To delete only funding data, submit an empty <fr:program name="fundref"/> element:
Example 2: Funder information outside of Crossmark` ``
The <fr:program> element captures funding data. It should be placed before the <doi_data> element. This deposit contains minimal funding data - one funder_name or one funder_identifier must be present; both are recommended.
<fr:program name="fundref">
<fr:assertion name="funder_name">National Science Foundation
<fr:assertion name="funder_identifier">https://doi.org/10.13039/100000001</fr:assertion> </fr:assertion>
</fr:program>
This example contains one funder_name and one funder_identifier. Note that the funder_identifier is nested within the funder_name assertion, establishing https://doi.org/10.13039.100000001 as the funder identifier for funder name National Science Foundation. Two award numbers are present.