At the end of last year, we were excited to announce our renewed commitment to community and the launch of three cross-functional programs to guide and accelerate our work. We introduced this new approach to work towards better cross-team alignment, shared responsibility, improved communication and learning, and make more progress on the things members need.
This year, metadata development is one of our key priorities and we’re making a start with the release of version 5.4.0 of our input schema with some long-awaited changes. This is the first in what will be a series of metadata schema updates.
What is in this update?
Publication typing for citations
This is fairly simple; we’ve added a ‘type’ attribute to the citations members supply. This means you can identify a journal article citation as a journal article, but more importantly, you can identify a dataset, software, blog post, or other citation that may not have an identifier assigned to it. This makes it easier for the many thousands of metadata users to connect these citations to identifiers. We know many publishers, particularly journal publishers, do collect this information already and will consider making this change to deposit citation types with their records.
Every year we release metadata for the full corpus of records registered with us, which can be downloaded for free in a single compressed file. This is one way in which we fulfil our mission to make metadata freely and widely available. By including the metadata of over 165 million research outputs from over 20,000 members worldwide and making them available in a standard format, we streamline access to metadata about scholarly objects such as journal articles, books, conference papers, preprints, research grants, standards, datasets, reports, blogs, and more.
Today, we’re delighted to let you know that Crossref members can now use ROR IDs to identify funders in any place where you currently use Funder IDs in your metadata. Funder IDs remain available, but this change allows publishers, service providers, and funders to streamline workflows and introduce efficiencies by using a single open identifier for both researcher affiliations and funding organizations.
As you probably know, the Research Organization Registry (ROR) is a global, community-led, carefully curated registry of open persistent identifiers for research organisations, including funding organisations. It’s a joint initiative led by the California Digital Library, Datacite and Crossref launched in 2019 that fulfills the long-standing need for an open organisation identifier.
Conflict, instability and economic conditions are just some of the factors driving new migration into Europe—and European policy makers are in dispute about how to manage and cope with the implications. Everyone agrees that in order to respond to the challenges and opportunities of migration, a better understanding is required of what drives migration towards Europe, what trajectories and infrastructures facilitate migration, and what the key characteristics of different migrant flows are, in order to inform and improve policy making.
The abstract above is taken from the successful Horizon 2020[1] project proposal called CrossMigration, an initiative of IMISCOE, Europe’s largest migration research network, in which a consortium of 15 universities, think tanks and international organizations, led by Erasmus University Rotterdam is currently designing a Migration Research Hub. The Hub is a web-based platform aimed at helping researchers and policymakers get a quick and comprehensive overview on research in the field of migration studies. This platform will also feature reports on specific fields, methodological briefing papers and other relevant content produced by the consortium.
The core of this Hub will consist of a database providing access to publications, research projects and datasets on migration drivers, and infrastructures, flows, and policies on current and future migration questions, indicators and scenarios. And that’s where our metadata story starts.
At the tail end of December I had the pleasure of speaking to the four researchers and developers working on this database; Vienna-based researchers Roland Hosner and Meike Palinkas from the International Centre for Migration Policy and Development (ICMPD), Bogdan Taut, CEO of YoungMinds, in Bucharest, Romania, and Nathan Levy, currently studying for his PhD at Erasmus University Rotterdam, Department of Public Administration and Sociology, Netherlands.
There are four of you, can each of you give me a very brief introduction to yourselves and how you fit into project?
Bogdan: I’m from YoungMinds, based in Bucharest in Romania. We were the last to join the consortium as the technical developer on the project. I am the project manager of the team, coordinating the technical development of the database.
Roland: I am a research officer with the International Centre for Migration Policy Development (ICMPD) in Vienna, and we are leading a part of this research project which deals with the population and implementation of the research database—which is core to the Migration Research Hub, and to the whole project.
Meike: I am also a research officer at ICMPD and work together with Roland. I joined the team in September this year.
Nathan: I’m part of the coordinating team of the overall project of CrossMigration. We are coordinating putting together the Migration Research Hub, the biggest part of which is the migration database. I am based at Erasmus University in Rotterdam and I work for Professor Peter Scholten who is the overall coordinator of the whole project along with Dr. Asya Pisarevskaya.
How long has the project been in progress?
Roland: It’s a two-year project than runs from March 2018 to the end of February 2020.
So it’s a two-year project and you are 10 months in—that makes it nearly at the halfway mark. Have you encountered any stumbling blocks that have held you back?
Bogdan: How to put this in a diplomatic way? We are all working around the clock to meet the deadline that we set ourselves and promised to deliver by. We have made the decision to produce the database in stages—very soon we will have the beta version out, so we have something to present. Then we are going to continue populating it with more items from every record type – journal articles, datasets, books, book chapters, reports etc.. At this point the other partners in the consortium can actually use it and work with it to map the fields and find the most recent and relevant literature on their respective subtopics such as migration drivers or migration infrastructures. In the summer when we are confident that it is a sound and attractive tool to be released, we will make it publicly available.
Nathan: In terms of specific deliverables for the project so far, our team has developed a taxonomy for migration research to give the fields a logical structure, and to structure this research database.
How has Crossref metadata contributed to your project?
Bogdan: We began by discussing all of the sources that need to be in the database and we put together an inventory of publishers, books and book chapters, etc., that would be relevant. Part of the scope of work for YoungMinds was to find ways of extracting information and relevant content from those sources. Once we started to dig into the content we found out that there are relevant aggregators, such as Scopus, Crossref, Web of Science and so on. We actually found Crossref through a recommendation from Scopus, someone there said ‘OK Crossref might be able to help you more’. Then Crossref became one of our main sources for metadata—in terms of basic metadata related to some types of content we gather for our database, such as journals and journal articles.
Roland: The more we moved forward, the more we saw how difficult it was to get in touch with each publisher individually, with each journal individually, to try and secure an agreement with them. So, it became very clear to us very quickly that we would not be able to create a properly inclusive database this way and we knew we had to look for partners and make use of existing resources. As we progressed from one conversation to the next we received a lot of advice, and that’s how we found out about Crossref. It soon became clear that Crossref was the ideal source for us because everything that has a DOI can be found in there. We knew if we had an agreement with Crossref then our project is half won, our database is halfway built, perhaps even more. And, then we just need to fill the gaps.
Nathan: Yes, this is one of Crossref’s key strengths—rather than having individual researchers or individual projects go to each publisher to try to find the appropriate people to talk to and negotiate—you use Crossref.
Which of the metadata values are important to you, what do you extract?
Roland: We thought about this a lot at the beginning, what we wanted to include. There are certain key things that are indisputably relevant—such as titles, names of the authors, editors, the year, DOI, dataset and so on, because we always link to the original source—the publisher’s website, or the journal article website. Ideally we would include keywords and abstracts (where they are available) because the richer the information the better. We also wanted to classify the items we have according to the taxonomy the CrossMigration project has established.
Nathan: In addition, abstracts and keywords have value for us. We want to apply a logical structure into the taxonomy on migration research, but we need content in order to do that. We need something for the algorithms that YoungMinds have developed to read to in order categorize research accordingly. The body of research on migration is so great and we cannot read through every abstract that’s ever been published on migration. That’s where the value of abstracts and keywords comes in for the Taxonomizer (as we fondly refer to it!).
What else would you like to see in the REST API that isn’t there?
Roland: More abstracts! We love abstracts!
Bogdan: Our data schema contains more fields, so we need more metadata than we can find from Crossref and other sources. Basically, the publisher’s website would produce the richest data, but it is the hardest to read. We are on a quest to find more sources because our algorithm works better if it has more information.
Once it’s complete, what are your plans to roll it out to the wider world?
Bogdan: IMISCOE is the leading organization of this consortium and it is in touch with most of the migration experts in Europe, so we already have all the contacts of the relevant people in the field.
Meike: It’s a tool for helping the community, so once we have all the relevant content inside it, we believe that word will spread relatively easily.
Have you all actually met in person?
Roland: Yes! Myself and Nathan met at the project kick-off meeting in Rotterdam in March 2018, then we met at a conference in Florence in June that was partly for the consortium but also had other invited experts and scholars. That was where we met face-to-face for the first time—it was just after we signed with YoungMinds for the IT services. And we recently met at another joint conference of IMISCOE and CrossMigration called 'Towards the IMISCOE Research Infrastructure of the Future'.
[1] Horizon 2020, the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available.
Great speaking to you all and learning a bit about this important project that will help policymakers manage and cope with the implications of migration—and may possibly even help them find ways to influence it.
If you’d like to share how you use our Metadata APIs please contact the Community team.