Abstract
BackgroundToday a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Although most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats. This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows easily reproducible. A programming library that abstracts over the data and metadata models of the different formats and allows supporting all of them in one step would significantly simplify the development of new and the extension of existing software to address the need for better metadata annotation.ResultsWe developed the Java library JPhyloIO, which allows event-based reading and writing of the most common alignment and tree/network formats. It allows full access to all features of the nine currently supported formats. By implementing a single JPhyloIO-based reader and writer, application developers can support all of these formats. Due to the event-based architecture, JPhyloIO can be combined with any application data structure, and is memory efficient for large datasets. JPhyloIO is distributed under LGPL. Detailed documentation and example applications (available on http://bioinfweb.info/JPhyloIO/) significantly lower the entry barrier for bioinformaticians who wish to benefit from JPhyloIO’s features in their own software.ConclusionJPhyloIO enables simplified development of new and extension of existing applications that support various standard formats simultaneously. This has the potential to improve interoperability between phylogenetic software tools and at the same time motivate usage of more recent metadata-rich formats such as NeXML or phyloXML.
Highlights
Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation
Simple annotations and more complex metadata can be attached to all elements of a document and JPhyloIO translates these using the available features of each supported format
Classic formats like Fast Adaptive Shrinkage Thresholding Algorithm (FASTA), PHYLIP or NEXUS still play an important role when working with sequences, alignments and phylogenetic trees, mainly because widely used applications often solely rely on these formats
Summary
Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows reproducible. Increasingly cheaper high-throughput sequencing technologies [3], Stöver et al BMC Bioinformatics (2019) 20:402 While these developments open up new perspectives for studies and applications that make use of big data, the practical reusability of data continues to be an issue. Linking relevant external resources (e.g., voucher information, digitized specimens or sequencing raw data) and providing metadata that reliably identifies the methods that were used to generate data (e.g., the software and parameters used for a phylogenetic inference) would further improve reusability of data and reproducibility of studies. Storing the results of phylogenetic analyses using metadata-rich formats is an ideal basis to link all necessary metadata and resources
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have