Abstract

Genomics and molecular imaging, along with clinical and translational research have transformed biomedical science into a data-intensive scientific endeavor. For researchers to benefit from Big Data sets, developing long-term biomedical digital data preservation strategy is very important. In this opinion article, we discuss specific actions that researchers and institutions can take to make research data a continued resource even after research projects have reached the end of their lifecycle. The actions involve utilizing an Open Archival Information System model comprised of six functional entities: Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning. We believe that involvement of data stewards early in the digital data life-cycle management process can significantly contribute towards long term preservation of biomedical data. Developing data collection strategies consistent with institutional policies, and encouraging the use of common data elements in clinical research, patient registries and other human subject research can be advantageous for data sharing and integration purposes. Specifically, data stewards at the onset of research program should engage with established repositories and curators to develop data sustainability plans for research data. Placing equal importance on the requirements for initial activities (e.g., collection, processing, storage) with subsequent activities (data analysis, sharing) can improve data quality, provide traceability and support reproducibility. Preparing and tracking data provenance, using common data elements and biomedical ontologies are important for standardizing the data description, making the interpretation and reuse of data easier. The Big Data biomedical community requires scalable platform that can support the diversity and complexity of data ingest modes (e.g. machine, software or human entry modes). Secure virtual workspaces to integrate and manipulate data, with shared software programs (e.g., bioinformatics tools), can facilitate the FAIR (Findable, Accessible, Interoperable and Reusable) use of data for near- and long-term research needs.

Highlights

  • Over the past decade, major advancements in the speed and resolution of acquiring data has resulted in a new paradigm, ‘Big Data.’ The impact of Big Data can be seen in the biomedical field

  • The actions involve utilizing an Open Archival Information System model comprised of six functional entities: Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning

  • We believe that involvement of data stewards early in the digital data life-cycle management process can significantly contribute towards long term preservation of biomedical data

Read more

Summary

Introduction

Major advancements in the speed and resolution of acquiring data has resulted in a new paradigm, ‘Big Data.’ The impact of Big Data can be seen in the biomedical field. The OAIS model includes six functions (shown in Figure 1) - Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning. An important practice for ensuring good research management in laboratories includes selecting the right medium (paper-based and/or electronic) for laboratory notebooks[11] Administration Both producers and consumers of data will be best served by implementing established procedures for digital preservation. A cloud-based data archive platform (shown in Figure 2) can provide a dynamic environment for managing research data life cycle along with capabilities for long-term preservation of biomedical data[30]. Access Needs in biomedical research can vary from simple queries (as shown in Figure 1) to a wide range of capabilities (workflow and software tools) usually employed for analysis of large scale data sets (genomics data)[31].

Conclusion
Michener WK
11. Schnell S
15. O’Reilly PD
17. Kazic T
22. Leonelli S
26. Kirlew PW
36. The Linux Foundation and Open API Initiative
Standard ISO: 14721: International Organization for Standardization: 2012
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call