Abstract

Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations.Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.

Highlights

  • As the cost of genomic laboratory experiments has fallen, vast amounts of genomic data are produced by high-throughput sequencing technologies as well as inexpensive microarrays in a wide range of studies

  • The goal of this paper is to overcome these issues by introducing the Genomic Feature and Variation Ontology (GFVO), which addresses this status quo by providing well described named ontological concepts and relations, by defining formal constraints on them, and by potentially enabling unified data access through a generic data representation

  • GFVO unifies the vocabulary of the genomic feature and variation file format specifications by providing a well-defined and documented set of classes and properties that capture the gist of genomic data

Read more

Summary

Introduction

As the cost of genomic laboratory experiments has fallen, vast amounts of genomic data are produced by high-throughput sequencing technologies as well as inexpensive microarrays in a wide range of studies (cf. Grada & Weinbrecht, 2013; Guarnaccia et al, 2014). Whilst the produced data are rich in information content, they are to varying degree kept in data silos due to the lack of a single unified standard for data exchange and data integration. This increases the cost of further data analysis, leaving much of that data not adequately used or unused (Sboner et al, 2011)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.