12 minute read.A progress update and a renewed commitment to community
Looking back over 2024, we wanted to reflect on where we are in meeting our goals, and report on the progress and plans that affect you - our community of 21,000 organisational members as well as the vast number of research initiatives and scientific bodies that rely on Crossref metadata.
In this post, we will give an update on our roadmap, including what is completed, underway, and up next, and a bit about what’s paused and why. We’ll describe how we have been making resourcing and prioritisation decisions, including a revised management structure, and introduce new cross-functional program groups to collectively take the work forward more effectively.
Itâs important to acknowledge that Crossref has evolved significantly from just five years ago - our member count has more than doubled from 10,000 to 21,000 organisations since 2019 and they include all kinds of organisations such as funders, universities, government bodies, NGOs, and of course scholar- and library-led publishers. The smaller organisations now collectively contribute the majority of Crossref funding. Weâve gone from 100 million records to 160 million in five years, and our metadata is retrieved more than 2 billion times monthly, quadrupling what it was five years ago.
Itâs within this context that weâve spent quite a lot of time thinking about scalability, how we collect and process feedback and contributions from many organisations, how to automate our operations, and refining the plans for the next few years.
Our strategic agenda remains the same
A few times a year we update the strategy page where there is a quadrant of projects showing whatâs completed, in progress, up next, and in planning/ideas - for each strategic theme. We also link from there to our live public roadmap which shows more specifics about individual projects, including projected timelines, and is updated more frequently.
If youâve been watching the strategy page, checking in on the public roadmap or this blog, or joining webinars and annual meetings, youâll know that weâve had some longstanding plans toâamong other thingsâreduce technical debt, rebuild our metadata management system, move to the cloud, modernise our schema, support multiple languages, and partner with multiple data sources to build the Research Nexus.
Youâve heard us talk about these initiatives a lot, but you’ve not seen particularly swift action.
Moving the work forward more effectively
Earlier this year, it became clear that our almost three-year project to build a new relationships API had not worked out. The project, dubbed âmanifoldâ, was to initially deliver data citations, and eventually replace our central metadata system, but what was prototyped didnât scale, even with a subset of our metadata. We werenât confident enough about the projectâs timeline or costs to justifiably continue investing further time and resources.
Meanwhile, weâd barely scratched the surface of our aim to pay down technical and operational debt, and weâd also been neglecting to keep the live system up to date with the numerous metadata changes that have been queued up, waiting to be implemented.
We knew the manifold project was ambitious â our system has grown in complexity over the years. We were trying to rebuild the car while driving it (our system needed to continue to operate and be maintained by our team) while trying to design a new approach to manage the many relationships between 160+ million database records. In the years we worked on this project, we learned a lot that will inform future plans for a large system redesign.
In March this year, we decided to pause the manifold project. We apologised to our community partners for not delivering the promised data<->literature matches they hoped to use. They were frustrated but thankfully understanding.
We then resolved to focus on backend infrastructural changes, conduct cross-training so that all of our staff would become familiar with current in-use systems instead of greenfield tech (for now), and start to make a dent in the backlog of bugs and long-promised schema updates in our mainstream services.
Weâre happy to report some movement on these things and some milestones that have been achieved in these areas in recent months.
Fostering a happy and dedicated team
Any kind of work can only happen when our staff are in a good place, feeling supported and comfortable to question things, and well-equipped with information, purpose, and clear priorities. In June, when the whole staff met up in person, we had some really good conversations about culture, communication, and about sharing responsibilities. Some people ran birds-of-a-feather sessions to explore the issues that had been keeping them up at night, such as authentication/security, and rebuilding the Crossref System (CS), and the team also co-created a set of prioritisation drivers that are now in use within our roadmap and planning processes.
Taking on feedback from the all-staff meeting and then the July board meeting, we thought strategically about the organisational structure Crossref would need over the next few years to reflect the growth in scope and size, and fulfil its longer term goals. We have long had an ambitious agenda but realised we didnât yet have the capacity to do it all. So we came to the conclusion that we needed an updated team and management structure to take us through the next phase of our development.
The structural changes were concluded at the end of November. They included:
- Moving Technology under Operations, since Technology—though a vital enabler—still works in service to our mission and in support of our community, just like other operational things like board governance and finance.
- Reframing product development as Programs and Services, and reducing our workstreams from five product portfolios to three programs. We formed cross-team steering groups around clearly articulated program areas (more on those below).
- Broadening the leadership to include an Executive team and an extended Director team, and forming a Senior Management Team (SMT). These changes ensure that the collective responsibility for Crossref now rests on a wider group of experts who can back each other up and share the risk and the knowledge, rather than on just a few individuals.
- We started recruiting for directors for two new leadership positions. Weâll welcome a new Director of Programs and Services and a new Director of Technology in the new year.
- Evolving the strategic initiatives team into a data science team, integrating research & development functions throughout all teams and with the SMT taking collective responsibility for strategic initiatives.
Unfortunately, with the shift in approach for product development and by sharing responsibility for strategic initiatives and research among the wider team, we made the difficult decision that four positions would no longer work within the new structure.
A new approach: joined-up initiatives and cross-functional programs
Research has always been an important role for Crossref, but as this function had been annexed from our regular work, it became hard to coordinate strategic initiatives across the wider organisation. In recent years we inadvertently created more technical debt for ourselves, i.e., built multiple prototype tools without plans for adoption or moving them into production. Strategic initiatives, by their nature, need thorough research and high-level alignment, so we made such initiativesâthings like Resourcing Crossref for Future Sustainability (RCFS) and improving the Integrity of the Scholarly record (ISR)âthe responsibility of the whole senior management team.
Some useful research had been conducted, but we were never in a position to act on any of it. Particularly promising work has been in the field of metadata matching, and with the growth in the community reliance on our metadata, and attention on data quality rightly increasing, we decided to create a new data science team to be dedicated to this work, led by Dominika Tkaczyk.
We had also struggled with a traditional product management approach since all our tools and activities are interconnected, and we found we were trying to do too many things at once but not all of them very effectively. We also acknowledged that product management comes from the commercial e.g. retail world and therefore is designed to help companies sell/upsell, which is not our goal. So we looked to other approaches more suitable to mission-based nonprofits.
Introducing three programs
We have introduced cross-functional program management in order to work towards the following:
- better cross-team alignment
- shared responsibility
- improve communication and learning
- make more progress on the things members need.
Supporting the strategic theme of co-creation, a new program, facilitated by Program Lead Lena Stoll, now manages and oversees all activities around co-creation and community trends. A cross-team steering group just began meeting regularly and will be responsible for interfaces such as reports/dashboards, record registration interfaces, connections and collaborations such as Open Funder Registry, ROR, ORCID auto-update, as well as OJS and other partner integrations. This program also includes the Crossref website and any front-end things to support other programs. And it includes ISR (the integrity of the scholarly record) and our tools in this area such as Crossmark and retraction/correction tooling, and Similarity Check for text comparisons.
Supporting the strategic theme of complete and global metadata and relationships, a new program, facilitated by Program Lead Martyn Rittman, now manages and oversees all activities relating to contributing to the Research Nexus. Working particularly closely with the metadata team, led by Patricia Feeney, this program addresses how metadata is modelled, used, enriched, and extended. Work includes our APIs, incorporating external data sources like Retraction Watch and Event Data, building out metadata matching services with the new data science team, supporting the community of metadata users with API sprints and more modern options for retrieving metadata based on usage and need.
Supporting the strategic theme of open and sustainable operations and keeping to the POSI framework, a new program, facilitated by Program Lead Sara Bowman, now manages and oversees all activities relating to making our operations more open, transparent, and sustainable. This program focuses on supporting and strengthening the core functions our members rely on and enabling future growth. It includes metadata deposit and processing, most apps for e.g. managing titles, authentication, and architectural and infrastructural projects like moving from the data centre to the AWS cloud service. This program also includes modernising our operations in general, which is not just technology but also finance and human resources, so projects like membership process automation, fee modelling and financial analyses, and business system integrations.
The Programs will start to be reflected across our website and in our communications from next year.
What are Crossref’s new prioritisation drivers?
These are the drivers that our ~40 staff co-created in June that are guiding decisions about the priorities on our roadmap. New ideas will be evaluated in the following areas:
- Encourage participation from new or under-represented communities
- Respond to and lead trends in scholarly communications
- Benefit the greatest number of members and users
- Reflect on how the community works with each other and allow members to self-serve
- Expand to support and connect relevant resource types and metadata fields
- Make it easier to create and update metadata
- Enhance metadata for completeness and accuracy
- Make it easier to retrieve and use metadata
- Automate repetitive/manual tasks
- Address technical and operational debt
- Maintain critical systems and operations and ensure their security
- Control or reduce costs - to Crossref, our community, or the environment
Weâre happy to report that the changes made this year have resulted in a productive last few months of the year. As reported in our annual meeting, here is the progress update.
Whatâs paused
- A relationships API endpoint and, therefore, a specific data citation feed
- Manifold, the the three-year effort to modernise our tech stack
- Most of the strategic initiatives prototypes that canât yet be scaled, such as Labs API and Labs reports
Whatâs recently completed
- We succeeded in moving the entire Crossref corpus to an open-source database, PostgreSQL
- Fixed numerous REST API data quality issues and lots of troublesome bugs
- Schema development - support for ROR as a Funder identifier is live and currently in testing
- We automated some very manual membership and billing processes, saving hundreds of staff hours a year
- Released a new form for journal article record registration, building on the grant registration form
- Upgraded Participation Reports to include Affiliations and ROR IDs
- Launched a new API Learning Hub
Since the rest of the community stops for no Crossref product roadmap issue, we also progressed a number of community and governance initiatives:
- The Grant Linking System (GLS) reached 5 years with over 40 funders joining Crossref and registering over 130,000 grants and awards, including use of facilities and projects
- Our research for Resourcing Crossref for Future Sustainability (RCFS) with the Membership & Fees Committee is going well, and weâll have new fee proposals for review in 2025
- The integrity of the Scholarly Record (ISR) conversations have deepened, and weâve formed strong relationships with editorial experts and research integrity sleuths, who are getting up to speed on our metadata, and weâre working with some sleuthing consultants to change our processes to handle deceptive member behaviour such as paper mills, cloned journals, and citation manipulation. The new data science team plays a role here, along with membership and governance.
Whatâs currently in focus
In our efforts to do less but do it more effectively, we have two current priorities:
- Get out of the physical data centre and into the cloud.
- Develop Schema 5.4.
These two projects are underway, involving lots of communication and learning. Since we havenât released any schema updates in many years, all our staff are learning for the first time how a metadata schema model is interpreted in a systemic way, learning about the structure of research objects, and honing the process as they go. Weâve high hopes weâll be in a position to release continuous metadata schema versions and catch up on the backlog over the coming years.
Whatâs next
- Continuous metadata development, with contributor roles up next
- Retraction Watch data integrated into the REST API so users have a single source of retraction/correction data
- Upgraded preprint matching and notifications
- Modelling more equitable fees through the RCFS projects
- Piloting a non-voting membership category
Once weâre fully in the cloud and in the groove of metadata updates, and with the support of newly-hired technology and program directors joining in the new year, weâll turn our attention to rebuilding the central metadata system that we call the Crossref System, or âCSâ and report more on this next year.
So that was our summary of 2024 and an indication of whatâs coming in 2025 and beyond; sorry itâs so long, and thanks for reading this far! Next year weâll get back to more regular updates as the strategic agenda and the programs progress.