Abstract

Good data curation is integral to cohort studies, but it is not always done to a level necessary to ensure the longevity of the data a study holds. In this opinion paper, we introduce the concept of data curation debt—the data curation equivalent to the software engineering principle of technical debt. Using the context of UK cohort studies, we define data curation debt—describing examples and their potential impact. We highlight that accruing this debt can make it more difficult to use the data in the future. Additionally, the long-running nature of cohort studies means that interest is accrued on this debt and compounded over time—increasing the impact a debt could have on a study and its stakeholders. Primary causes of data curation debt are discussed across three categories: longevity of hardware, software and data formats; funding; and skills shortages. Based on cross-domain best practice, strategies to reduce the debt and preventive measures are proposed—with importance given to the recognition and transparent reporting of data curation debt. Describing the debt in this way, we encapsulate a multi-faceted issue in simple terms understandable by all cohort study stakeholders. Data curation debt is not only confined to the UK, but is an issue the international community must be aware of and address. This paper aims to stimulate a discussion between cohort studies and their stakeholders on how to address the issue of data curation debt. If data curation debt is left unchecked it could become impossible to use highly valued cohort study data, and ultimately represents an existential risk to studies themselves.

Highlights

  • Software engineering has a well-defined concept of technical debt,[1] first described in 1992.2 It gives an indication of the work required to re-engineer software when a suboptimal solution has been implemented

  • We introduce the concept of data curation debt - the data curation equivalent to the software engineering principle of technical debt

  • In the context of cohort studies, where data have often been collected over a long period of time, we introduce an analogue of technical debt— “data curation debt.”

Read more

Summary

Introduction

Software engineering has a well-defined concept of technical debt,[1] first described in 1992.2 It gives an indication of the work required to re-engineer software when a suboptimal solution has been implemented. There are many reasons for intentionally accruing technical debt, for example time pressures or lack of available expertise. It is common to accrue technical debt unknowingly, such as: when a software developer edits small parts of a large code base without understanding the overarching structure, they might introduce sub-optimal approaches. There are numerous ways technical debt can be incurred, including not documenting code when writing it or not depositing code in a version control repository. The interest accrual on technical debt makes it more difficult to pay it back; this could be because as time passes it becomes more difficult to write documentation or commit the relevant code to a version control repository. It is important to note that even when steps are taken to change practice and stop the accrual of new technical debt, the existing debt still exists and must be addressed

Objectives
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call