MolBioLib: a C++11 framework for rapid development and deployment of bioinformatics tasks

Toshiro K Ohsumi,Mark L Borowsky

doi:10.1093/bioinformatics/bts458

Toshiro K Ohsumi, Mark L Borowsky

Open Access

https://doi.org/10.1093/bioinformatics/bts458

Copy DOI

Abstract

Summary: We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative single-nucleotide polymorphisms in whole genome sequencing, detect balanced chromosomal rearrangements and compute enrichment of messenger RNAs (mRNAs) on microtubules, typically requiring applications of under 200 lines of code. MolBioLib includes programs to perform a wide variety of analysis tasks, such as computing read coverage, annotating genomic intervals and novel peak calling with a wavelet algorithm. Although MolBioLib was designed primarily for bioinformatics purposes, much of its functionality is applicable to a wide range of problems. Complete documentation and an extensive automated test suite are provided.Availability: MolBioLib is available for download at: http://sourceforge.net/projects/molbiolibContact: ohsumit@molbio.mgh.harvard.edu

Full Text