About a year ago, PLOS implemented a new process intended to further the overarching principle that data used in the work we publish should be accessible and reusable. The motivation goes hand-in-hand with both our open access ethos and the scientific method itself: the validity of a conclusion depends on the ability to reproduce the underlying results. In theory, PLOS’ new data policy, in which “all data underlying the findings described [must be] fully available,” is not so new; in practice, though, it may be perceived as burdensome, complicated, and/or inefficient. The policy and its purpose has been discussed extensively among PLOS Genetics Editorial Board members with regard to its potential impact on editors, authors, the community, and research subjects. The purpose of this editorial is 2-fold: to acknowledge and discuss aspects of the “data-sharing” process that are especially challenging, and to provide additional clarification and guidance in the context of a few scenarios that are especially relevant, all from the perspective of working scientists who read, evaluate, and contribute to research based on genetics and genomics. We also suggest a way forward that builds on what already works at PLOS: consulting with multiple stakeholders whose interests intersect to develop consensus. An issue that is often problematic for PLOS Genetics authors is the sheer volume of data generated by large-scale phenotyping and/or genotyping studies. Whether from a confocal microscope, a radiofrequency detector in an MRI, or a CCD in a DNA sequencing instrument, processing, filtering, and compression of digital data is inherent across many areas of modern biology. It is often neither practical nor wise to archive and distribute the primary output of digital detectors; indeed, the question of what constitutes “raw data” is a moving target. A second problematic issue arises from ethical concerns associated with human research subjects. The potential to identify research subjects based on genomic information has received considerable attention and has fostered the development of controlled access mechanisms, in which researchers must seek approval from data access committees. As with any new set of regulations, there is a risk of creating more problems than are solved; from an editorial perspective, we recognize that authors must commit significant resources to deposit data into controlled access repositories, and that the process of extracting data can also be problematic. Regulatory burdens also create conflicts since the legitimate concerns of human subject review boards and governmental agencies to protect privacy are not necessarily prima facie aligned with open access policy. In short, the system is not working efficiently. Finally, for research that is not publicly funded, there can be legitimate reasons to restrict data sharing. Personal genomics companies, agricultural genetics corporations, and domestic animal breeders all support worthwhile research, but may have responsibilities to shareholders and/or contributors that restrict the ability to fully share the underlying data. In addition, there are efforts in some countries to protect data and genetic resources as part of a national interest. While we acknowledge and understand the underlying rationale for governmental and/or commercial restrictions on data sharing, these represent an inherent conflict with the principles of open access.
Read full abstract