Applications for this position is closed.
Request for services: Software Development Contracting
Location: Remote
Duration: Until completion of the specified software.
Project summary
Our new generation of REST API features requires us to build the “Metadata Rendering Framework”. This is a subsystem that coordinates the rendering of bibliographic metadata in a variety of formats. We are looking for a contract software developer to help us build this.
The Rendering Framework should maintain a set of rendered metadata objects in S3. It should trigger content to be re-rendered when any relevant change occurs in the database. It should also provide a simple REST API interface for retrieving these S3 objects. The code that renders each already exists, so complex data modeling is not required, though an understanding of the metadata is necessary.
This module will be implemented with our existing source Kotlin codebase. It will integrate with other pre-existing software components written in Kotlin, Java and Clojure.
High-level specifications are included here for scoping purposes. We expect an iterative approach and we will supply feedback and guidance. Code will be reviewed by Crossref developers.
Deliverables
You will report to the Head of Software Development and collaborate with a member of the Product Team.
The initial scope of the project should result in the following deliverables. This may evolve as we iterate on the work. There may also be subsequent projects for which specs and deliverables will be defined and agreed upon by both parties.
- Extensible rendering framework built.
- Initial implementations / integrations for four initial data formats (
application/citeproc+json
, application/vnd.crossref.member+json
, application/vnd.crossref.matching.grant+json
, application/vnd.crossref.matching.citation+json
). - Full tests as part of our existing BDD / Cucumber suite.
Code must meet our standards (SONAR).
Skills
- Understanding of bibliographic metadata formats such as Citeproc-JSON.
- Experience with Kotlin and Spring Boot.
- Experience with Clojure.
- Experience writing BDD tests with Cucumber.
- Open source software development practices.
Timeline
We would like responses by 15th February. Work can commence immediately. Because of the nature of software projects we do not expect an estimate, but we expect this may take of the order of weeks.
To respond
Please send a CV and a cover letter (each no longer than 2 pages) to share how you meet the requirements of the contract role, and a rate sheet or fee schedule to jobs@crossref.org.
About Crossref
Crossref is a non-profit membership organization that exists to make scholarly communications better. We make research objects easy to find, cite, link, assess, and reuse. We’re passionate about providing open foundational infrastructure for the scholarly communications ecosystem - and we’re continuously evolving our tools and services in response to emerging needs.
Crossref is at its core a community organization with 17,500 members across 148 countries (and counting)! We’re committed to lowering barriers for global participation in the research enterprise, we’re funded by members and subscribers, and we engage regularly with them in multiple ways from webinars to working groups.
Crossref operates and continuously develops an impressive portfolio of services, products and features to support scholarly communication and infrastructure organisations to contribute to, maintain and preserve robust documentation of the scholarly process. From registration forms and APIs, to complex systems of linking scholarly works with references or citations, and metadata retrieval, our busy Product Team continuously develops and refines these metadata tools.
Specification outline
The following is an indicative specification for scoping purposes. We expect the code to be iteratively specified by a BDD suite.
Background
The Item Graph is the database that powers the next generation of Crossref services. “Items” in the graph are things such as Works, Members, Funders, etc. Items are also used to represent reified relationships (such as citations which themselves have metadata).
The Item Tree Retriever is an existing module that can retrieve subgraphs from the Item Graph in connection with an Item. For example, for a Work it would retrieve citations and other assertions. It works in a generic way, following links to a given depth, using a given strategy.
Each Item may have a number of natural representations. For example, a Work could be rendered into Citeproc-JSON for end-user consumption or a specialized representation for a search index. A Member will be rendered into our existing JSON format. We have prior code to run these translations in some cases, detailed below.
The Content Rendering Framework will be a module that translates Item Trees into content representations and keeps track of them.
The Content Rendering Framework will have a registry of Media Types (aka MIME types, per IANA vocabulary), in the vocabulary of MIME types. The initial deliverable will include:
- application/citeproc+json
- application/vnd.crossref.member+json
- application/vnd.crossref.matching.grant+json
- application/vnd.crossref.matching.citation+json
Representation Storage
The Content Rendering Framework will store rendered representations of Items in S3 object storage. It will support the storing and retrieval of rendered content by Item ID and Media Type. It will maintain an ETag value for stored versions so we easily detect when there is a change to the rendered representation.
Renderer
The Renderer will render Item Trees into requested Media Types. It will dispatch to relevant rendering code.
Trigger
The Content Rendering Framework will keep track of Items’ rendered representations. For every representation it will indicate whether it is considered ‘stale’, i.e. potentially in need of re-rendering.
It will watch the Property and Relationship Assertion tables. When an assertion is made in connection with an Item, that Item is marked as being stale and needing re-rendering. For example, when the title of an Work changes, it should be marked for re-rendering. When a member name changes, every Work that’s connected to it should be marked for re-rendering.
A continual process will re-render stale Representations. It will compare the ETag of the content with the stored item and only update it if the re-render resulted in a change.
This process will be based on an SQS queue. This process will be a configured profile of the running service, allowing us to scale out rendering on demand.
Collections
A Collection is a named set of Items. For example, “the set of Works”, “the set of Members”, etc. When Items are ingested, the ingester code can mark Items as belonging to a given set.
A Collection is also associated with a configuration that indicates the set of content types that items should be rendered to.
Various modules that are responsible for ingesting Items will indicate that Items they ingest belong to a given set.
Versions API functionality
A simple REST API endpoint will list the list of versions for each Item for a given format, allowing users to see the history of a rendered item.
A similar endpoint will be available for Collections, which will provide a list updates for all Items in that collection.