Abstract

Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity.

Highlights

  • An openly shared interaction platform confers great value to the biological research community, it may introduce quality and security risks

  • In section Approaches for Improving Biological Databases, we attempt to introduce greater trust in the data and analyses by providing recommendations to mitigate or account for these errors and vulnerabilities and point to approaches used by other Internet databases

  • An important goal for bioinformatics is the continuous improvement of biological databases

Read more

Summary

INTRODUCTION

An openly shared interaction platform confers great value to the biological research community, it may introduce quality and security risks. Without a system for trusted correction and revision, these shared resources may facilitate widespread dissemination and use of low-quality content, for instance, taxonomically misclassified or erroneous sequences. As these public databases increase in size and importance, they may fall victim to the same security issues and abuses that plague cyberspace to this day. If we act by developing the databases with quality and security as a design philosophy, we can protect these databases at a much lower cost and with fewer challenges than we currently face with the Internet In this Perspective, the authors aim to outline some potential quality assurance and security weaknesses in existing public biological repositories. Some approaches have been proposed to protect unauthorized disclosure (Kim and Lauter, 2015; Mandal et al, 2018; Ozercan et al, 2018) and, while we don’t survey these approaches in this perspective, we note that the public database community may benefit from these ideas as well

BACKGROUND
Findings
PRELIMINARY CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.