The Biological Object Notation (BON): a structured file format for biological data

Jan P Buchmann,Mathieu Fourment,Edward C Holmes

doi:10.1038/s41598-018-28016-6

Abstract

The large size and high complexity of biological data can represent a major methodological challenge for the analysis and exchange of data sets between computers and applications. There has also been a substantial increase in the amount of metadata associated with biological data sets, which is being increasingly incorporated into existing data formats. Despite the existence of structured formats based on XML, biological data sets are mainly formatted using unstructured file formats, and the incorporation of metadata results in increasingly complex parsing routines such that they become more error prone. To overcome these problems, we present the “biological object notation” (BON) format, a new way to exchange and parse nearly all biological data sets more efficiently and with less error than other currently available formats. Based on JavaScript Object Notation (JSON), BON simplifies parsing by clearly separating the biological data from its metadata and reduces complexity compared to XML based formats. The ability to selectively compress data up to 87% compared to other file formats and the reduced complexity results in improved transfer times and less error prone applications.

Highlights

Biological data, which includes, but is not limited to molecular sequences, annotations and phylogenetic trees, are still predominantly exchanged as flat files or in line-based formats despite the existence of more structured file notations that are better suited to complex data
NCBI’s Entrez utility or Representational state transfer” (REST) application programming interfaces (APIs) only export biological data in FASTA and XML formats, other information is available in JSON9
To demonstrate the versatility of biological object notation” (BON) we designed a method to encode phylogenetic trees using the JavaScript Object Notation (JSON) syntax based on NeXML6 and which allows the addition of arbitrary metadata (Fig. 3e; Supplementary Table 5)

Summary

Introduction

Biological data, which includes, but is not limited to molecular sequences, annotations and phylogenetic trees, are still predominantly exchanged as flat files or in line-based formats despite the existence of more structured file notations that are better suited to complex data. These structures can describe virtually all biological data sets while retaining low parsing complexity, for example because additional checks like those for attribute values in XML tags are omitted. The TinySeq and uncompressed BON files in the Genome and Collection data sets were almost identical in size, with the exception of the Plant EST subset that was ~18% smaller in BON. The compressed BON files were between 43% and 70% smaller in the nucleotide sequence data sets (Fig. 3a,b; Supplementary Table 2).

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jun 25, 2018
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

The Biological Object Notation (BON): a structured file format for biological data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Implementation of a general documentation system for web-based administration and use of historical series of meteorological and biological data
Tor Håkon Sivertsen
Physics and Chemistry of the Earth | VOL. 30
Tor Håkon SivertsenTor Håkon Sivertsen
26 Oct 2004
Physics and Chemistry of the Earth | VOL. 30

Branching, blending, and the evolution of cultural similarities and differences among human populations
Mark Collard ... Jamshid J Tehrani
Evolution and Human Behavior | VOL. 27
Mark Collard, et. al.Mark Collard ... Jamshid J Tehrani
23 Sep 2005
Evolution and Human Behavior | VOL. 27

A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank
Andrey Smelter ... Hunter N B Moseley
BMC Bioinformatics | VOL. 18
Andrey Smelter, et. al.Andrey Smelter ... Hunter N B Moseley
17 Mar 2017
BMC Bioinformatics | VOL. 18

Translating JSON Data into Relational Data Using Schema-oblivious Approaches
Rahwa Bahta ... Mustafa Atay
-
Rahwa Bahta, et. al.Rahwa Bahta ... Mustafa Atay
18 Apr 2019
18 Apr 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Biological Object Notation (BON): a structured file format for biological data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific Reports