Dominika Tkaczyk

Dominika joined Crossref in 2018 as a Principal R&D Developer, where she focused on metadata matching research aimed at enriching the scholarly record through the discovery of new relationships. In 2024, she became Crossref’s Director of Data Science and established the Data Science team, with a mission to explore innovative ways of using data to support the scholarly community, enrich the Research Nexus with more metadata and relationships, and develop collaborations with like-minded community initiatives. Since 2025, Dominika has served as Director of Technology, leading a unified technology team that integrates infrastructure, software development, and data science functions. Dominika holds a PhD in Computer Science from the Polish Academy of Sciences. Prior to joining Crossref, she she was a researcher and a data scientist at the University of Warsaw, Poland, and a postdoctoral researcher at Trinity College Dublin, Ireland.

The more the merrier, or how more registered grants means more relationships with outputs

Dominika Tkaczyk, Rachael Lammey, Ginny Hendricks – 2023 February 22

In Grant Linking SystemResearch FundersMetadata Matching

One of the main motivators for funders registering grants with Crossref is to simplify the process of research reporting with more automatic matching of research outputs to specific awards. In March 2022, we developed a simple approach for linking grants to research outputs and analysed how many such relationships could be established. In January 2023, we repeated this analysis to see how the situation changed within ten months. Interested? Read on!

Follow the money, or how to link grants to research outputs

Dominika Tkaczyk – 2022 March 22

In Grant Linking SystemLinkingR&DMetadata Matching

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, a paper cites a book, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact. No wonder that finding and exposing such relationships is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Lately, another instance has been drawing more and more attention: linking research outputs with grants used as their funding source. How can this be done and how many such links can we observe?

Double trouble with DOIs

Dominika Tkaczyk – 2020 March 10

In R&DMetadata

Detective Matcher stopped abruptly behind the corner of a short building, praying that his loud heartbeat doesn’t give up his presence. This missing DOI case was unlike any other before, keeping him awake for many seconds already. It took a great effort and a good amount of help from his clever assistant Fuzzy Comparison to make sense of the sparse clues provided by Miss Unstructured Reference, an elegant young lady with a shy smile, who begged him to take up this case at any cost.

Crossref metadata for bibliometrics

Ginny Hendricks, Dominika Tkaczyk, Patricia Feeney – 2020 February 21

In MetadataBibliometricsCitation DataAPIsAPI Case Study

Our paper, Crossref: the sustainable source of community-owned scholarly metadata, was recently published in Quantitative Science Studies (MIT Press). The paper describes the scholarly metadata collected and made available by Crossref, as well as its importance in the scholarly research ecosystem.

What’s your (citations’) style?

Dominika Tkaczyk – 2019 October 29

In CitationR&DMachine Learning

Bibliographic references in scientific papers are the end result of a process typically composed of: finding the right document to cite, obtaining its metadata, and formatting the metadata using a specific citation style. This end result, however, does not preserve the information about the citation style used to generate it. Can the citation style be somehow guessed from the reference string only?

TL;DR

I built an automatic citation style classifier. It classifies a given bibliographic reference string into one of 17 citation styles or “unknown”.
The classifier is based on supervised machine learning. It uses TF-IDF feature representation and a simple Logistic Regression model.
For training and testing, I used datasets generated automatically from Crossref metadata.
The accuracy of the classifier estimated on the test set is 94.7%.
The classifier is open source and can be used as a Python library or REST API.

Introduction

Threadgill-Sowder, J. (1983). Question Placement in Mathematical Word Problems. School Science and Mathematics, 83(2), 107-111

This reference is the end result of a process that typically includes: finding the right document, obtaining its metadata, and formatting the metadata using a specific citation style. Sadly, the intermediate reference forms or the details of this process are not preserved in the end result. In general, just by looking at the reference string we cannot be sure which document it originates from, what its metadata is, or which citation style was used.

What if I told you that bibliographic references can be structured?

Dominika Tkaczyk – 2019 July 08

In LinkingCitationR&DReference MatchingMetadata Matching

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

Reference matching: for real this time

Dominika Tkaczyk – 2018 December 18

In LinkingCitationR&DReference MatchingMetadata Matching

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Matchmaker, matchmaker, make me a match

Dominika Tkaczyk – 2018 November 12

In LinkingCitationR&DReference MatchingMetadata Matching

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?

What does the sample say?

Dominika Tkaczyk – 2018 November 09

In LinkingCitationR&DReference Matching

At Crossref Labs, we often come across interesting research questions and try to answer them by analyzing our data. Depending on the nature of the experiment, processing over 100M records might be time-consuming or even impossible. In those dark moments we turn to sampling and statistical tools. But what can we infer from only a sample of the data?

RSS Feed

Get involved

Find a service

Documentation

About us

2026 February 03

Innovation in scientific publishing and its implications for Crossref DOI registration practices - MetaROR’s approach

2026 January 28

A spotlight on our community in Indonesia

2026 January 22

Insights from a roundtable on author affiliation metadata

2026 January 14

The GEM program - Year Three and program expansion for 2026

Blog