WP4 - NA4: Digital Standards & Processes [Months: 1-48]
=======================================================
NA4 will provide the standards to allow technical coordination between institutions that will drive innovation by linking
data together to improve interoperability. An important role for this action is to provide documentation that will be used
to deliver the support and policies disseminated in NA2.
NA4 will roll out several key standards to institutions and develop these further to facilitate new research and data
discovery, particularly as it relates to collection objects. For standards relating to molecular genetics, genomics and
tissue bank data, NA3 and NA4 will work closely to ensure complementarity and maintain interoperability linking
genomic data and the voucher specimens upon which these sequences are based.
----------
[Task 4.1: Biodiversity data standards: landscape analysis and compliance][1]
------------------------------------------------------------------------
The goal of this task is to improve harmonisation of data and software standards across Europe and ensure that the standards are fit for use and future proof.
**Subtask 4.1.1: Landscape analysis of existing domain-specific data standards**
With a specific focus on collection objects we will perform a landscape analysis of existing domain-specific data standards to determine
1) which standards are being used in related data management, workflow, and analysis tools;
2) which elements are missing and
3) what research needs are on the horizon.
This work will determine the extent of adoption of key standards by worldwide biodiversity information infrastructures. This will feed into a monitoring dashboard and testing framework showing institutional adoption and compliance within Europe, including monitoring progress on tasks 4.1 and 4.3. This subtask supports objectives 1 and 2 of NA4.
**Subtask 4.1.2: Supporting the standards development process**
Standards need constant review to react to technological developments and to the demands of new methodologies. This
subtask will support community activity on standards through meetings and hackathons working on the development, dissemination, use and extension of biodiversity standards. Meeting topics will be evaluated by an evaluation committee of partner organisations and priority will be given to proposals focused on improving interoperability between collections. Areas where standards work is needed will be informed by the findings of subtask 4.1.1, but is anticipated to include standards for ecological interactions (e.g. pollination and parasitism); invasive species; phenology and
links with geospatial and biographical information. A particular focus will be the standards required to exploit the semantic interoperability of collections connected through the stable identifier framework (task 4.2). Standards relating to molecular data and collections, such as environmental samples (eDNA) and HTS library samples, will be covered by NA3 (task 3.1).
[Task 4.2: Adoption and development of the CETAF stable identifier framework][2]
------------------------------------------------------------------------
Improving adoption of the CETAF stable identifier framework will increase the interoperability of collection data and services. This will speed the delivery of services; the discovery of data and the analysis of Big Data.
**Subtask 4.2.1: Adoption of the CETAF stable identifier framework**
Critical to the reliability and reproducibility of research is the persistent identification and citability of source material.
The CETAF stable identifier framework [REF] has been developed to provide a globally unique identifier for every collection object. This task will roll out stable identifiers to SYNTHESYS+ partners. It will improve the implementations of the system to make them conformant with linked open data principles (see www.w3.org/tr/ld-bp and Güntsch, et al., 2017) as well as community standards agreed on in task 4.1. This includes a redirection facility to human- and machinereadable representations of the specimen data. The task will also implement stable identifiers in collection networking
initiatives such as JACQ and DINA, as well as new collection types held by partner organisations. Implementation of stable identifiers have consequences for collection management and through this task we will report on these implications from a curatorial perspective, generating collection data profiles for improved collections interoperability. Lastly, we will develop best practices and recommendations for the future development of the CETAF stable identifier framework.
**Subtask 4.2.2: Semantic specimen catalogue**
Based on a pilot system currently being prototyped by CETAF/ISTC, we will build a dynamic catalogue of Linked Open Data enabled specimens held by European collections. This catalogue will be the central index for both monitoring the availability and standards compliance of specimen data, as well as the implementation of inference mechanisms across semantic resources linked to collection items (e.g. people, taxa, and geographic features). The capabilities of
this catalogue will be demonstrated with pilot applications demonstrating how new information can be derived using semantic inference mechanisms.
[Task 4.3: International Image Interoperability Framework (IIIF) API Specifications][3]
------------------------------------------------------------------------
When a CETAF stable identifier links to a specimen record that is accompanied by a high-resolution image (or images) we want to be able to view that image in a way that is independent of the hosting institution and the technologies they use. This will enable applications to be built that can browse and combine specimen images across multiple institutions in real time, creating a truly virtual collection from the user’s perspective. IIIF is a set of API specifications used widely in the libraries and archives community for this purpose. It enables the combining of multiple views of single objects or building composite views of objects from images stored in different locations.
**Subtask 4.3.1: Exemplar IIIF implementations.**
Ten institutions will work together to implement IIIF endpoints for their specimen images. This will dramatically improve the ability to browse specimen images held at different institutions. They will come together to share lessons learned. These ten implementations will be used as exemplars in subtask 4.3.2 and subtask 4.3.3.
**Subtask 4.3.2: Integration of CETAF Identifier**
Four partners will work together to implement content negotiation approaches for IIIF browsers and CETAF identifier end points. The end goal will be to use CETAF Identifiers directly in IIIF browsers or develop conventions for their use.
**Subtask 4.3.3: Growing the network**
Partners involved in subtask 4.3.1 and subtask 4.3.2 will share the knowledge by producing an implementation manual for IIIF and CETAF Identifiers in our community, and mentor other institutions in implementing them. The CETAF network will be used to outreach to the broader community and disseminate outcomes among relevant policy making representatives.
These subtasks will only be achievable by networking knowledge and decision-making between partners to come up with common implementation practices as well as learning from each other’s mistakes. We will collaborate on-line and organise hackathons to support this work. This task will provide services for JRA3, piloting the identification of duplicate herbarium specimens across collections and automatically providing links to online specimens.
[1]: https://osf.io/k35bx/wiki/home/
[2]: https://osf.io/ehqcb/wiki/home/
[3]: https://osf.io/n89bh/wiki/home/