FAIR Forever? Long Term Data Preservation Roles and Responsibilities

Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the research community, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy. Capacity erodes quickly, so skills, technology, and policy need to remain fit for changing purpose.

To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSCsecretariat.eu in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.

The final report of the study summarises key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. To better ensure that European Open Science can be FAIRer for longer, nineteen recommendations are made and tabulated with respect to owners, and five candidate services are delineated to address use cases that were identified.

Objectives & Challenges

The FAIR Forever study addresses the need for ongoing reconnaissance and assessment of digital preservation within EOSC. Digital preservation involves the continuous interaction of policy, technology and capacity. New standards develop while older standards become obsolete, and policy objectives become fossilised and redundant. In addition to technological advances, there are technological challenges with managing and preserving large quantities of open data coupled with high-performance storage and computing resources. While the need for ongoing assessment and renewal of technical infrastructure is apparent, the need to assess and renew social and organizational infrastructure can frequently be overlooked.

The objective of the FAIR Forever study was to assess current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and determine the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.

The study has allowed the DPC to make a series of statements about digital preservation in the context of EOSC through desk-based assessment, stakeholder interviews, and interactions with the broader digital preservation community through focus groups.

Main Findings

The study found EOSC’s emerging vision, articulated most fully in the drafts of the Strategic Research and Innovation Agenda (SRIA), lacks clarity about digital preservation. Additionally, while the SRIA provides an encompassing definition of data to include all digital outputs of research, the study found a strong (perhaps rhetorical) tendency to focus on the preservation of data, with the term often used inconsistently among EOSC stakeholders. All EOSC stakeholders should be aware of and recognise the width of the preservation challenge implied by a broad, maximal definition of data; data sets, publications, correspondence, software, applications, libraries, code, micro-service dependencies, execution environments and operating systems will all need to be preserved or recreated depending on scientific use cases. Software in particular is a significant digital preservation challenge as the certification of code repositories and the validation of emulation or virtualisation services are still immature.

Undoubted strengths within the vision, including a commitment to persistent identifiers, data management planning, robust data storage, and repository certification, provide a necessary but insufficient basis to secure digital assets in the long term. Roles, responsibilities and accountabilities are opaque and the path-dependency of digital preservation is not fully understood. Risks to reputation and data for EOSC arise from the technical complexity and uncertain accountabilities in the EOSC vision. Furthermore, there remain additional challenges tied to existing and available resources such as the lack of clear funding and costing models for digital preservation and specific skills and training for the various actors in and across preservation activities. This overall reading of the EOSC vision against the classification scheme offered in the DPC Global List of Digitally Endangered Species would suggest that data in the EOSC ecosystem is ‘critically endangered’.

Main Recommendations

Key findings from the study’s interviews, interactions, and focus groups reinforce the need for elucidation of roles and responsibilities for digital preservation in EOSC, and recommended solutions to mitigate the risks. In particular, participants in this research have emphasised the need to clarify accountabilities that are implicit but never activated within data management plans (DMPs). Based on the cumulative findings, the study makes nineteen recommendations for action tabulated with respect to owners of the recommendation to offer particular responsibilities and accountabilities corresponding to each, and delineates five candidate model services, with respective strengths and weaknesses, which would satisfy key use cases identified in the course of the research.

