Abstract

In the past scientists reported summaries of their findings; they did not provide their original data collections. Many stakeholders (e.g., funding agencies) are now requesting that such data be made publicly available. This mandate is being adopted to facilitate further discovery, and to mitigate waste and deficits in the research process. At the same time, the necessary infrastructure for data curation (e.g., repositories) has been evolving. The current target is to make research products FAIR (Findable, Accessible, Interoperable, Reusable), resulting in data that are curated and archived to be both human and machine compatible. However, most scientists have little training in data curation. Specifically, they are ill-equipped to annotate their data collections at a level that facilitates discoverability, aggregation, and broad reuse in a context separate from their creation or sub-field. To circumvent these deficits data architects may collaborate with scientists to transform and curate data. This paper’s example of a data collection describes the electrical properties of outer hair cells isolated from the mammalian cochlea. The data is expressed with a variant of The Ontology for Biomedical Investigations (OBI), mirrored to provide the metadata and nested data architecture used within the Hierarchical Data Format version 5 (HDF5) format. Each digital specimen is displayed in a tree configuration (like directories in a computer) and consists of six main branches based on the ontology classes. The data collections, scripts, and ontological OWL file (OBI based Inner Ear Electrophysiology (OBI_IEE)) are deposited in three repositories. We discuss the impediments to producing such data collections for public use, and the tools and processes required for effective implementation. This work illustrates the impact that small collaborations can have on the curation of our publicly-funded collections, and is particularly salient for fields where data is sparse, throughput is low, and sacrifice of animals is required for discovery.

Highlights

  • Collaborative curation of electrophysiology data collection attributes and the variant ontology), and this construct allows for aggregation using the hierarchal strengths inherent to Hierarchical Data Format version 5 (HDF5), it should make it easier for researchers, whether they are familiar or unfamiliar with such data, to understand and reuse them for their own purposes

  • If a researcher used a different protocol to interrogate the cell this protocol could be added to the application ontology with any associated new classes imported or defined

  • We show that a poorly structured data collection can become human-readable through the use of extensive descriptions and by employing an ontology

Read more

Summary

Introduction

Scientists publish reports that describe their experimental findings, while data collected to uncover these findings are not typically reported, nor are they made readily. To hasten discovery in auditory electrophysiology and to facilitate effective data sharing (addressing impediment (iii)), a scientist, (BF) initiated a collaboration with a data architect (JB) to transform electrophysiological data from private to public use This data collection describes the electrical properties of outer hair cells isolated from the mammalian cochlea of guinea pigs. The additional ontologies include Computational Neuroscience Ontology, (CNO) [95], Gene Ontology (GO) [97], Mammalian Phenotype Ontology, (MP) [98], Ontology of Physics for Biology, (OPB) [102], Semantic Science Integrated Ontology (SIO) [104], Systems Biology Ontology (SBO) [103], National Cancer Institute Thesaurus (NCIT) [100], and National Center for Biotechnology Information (NCBI) Organismal Classification (NCBITaxon) [99] By mapping these data and metadata onto an application ontology: OBI based Inner Ear Electrophysiology (OBI_IEE), the logical connections of the data are preserved, which should enhance opportunities for search and discovery of this data [60]. By combining the data and metadata together, researchers seeking to reference and re-use the data should find sufficient qualitative context to make meaningful use of the data into the future [54]

Design data architecture based upon ontology
Results and discussion
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call