Abstract

As more than 90% of species in a microbial community could not be isolated and cultivated, the metagenomic methods have become one of the most important methods to analyze microbial community as a whole. With the fast accumulation of metagenomic samples and the advance of next-generation sequencing techniques, it is now possible to qualitatively and quantitatively assess all taxa (features) in a microbial community. A set of taxa with presence/absence or their different abundances could potentially be used as taxonomical biomarkers for identification of the corresponding microbial community’s phenotype. Though there exist some bioinformatics methods for metagenomic biomarker discovery, current methods are not robust, accurate and fast enough at selection of non-redundant biomarkers for prediction of microbial community’s phenotype. In this study, we have proposed a novel method, MetaBoot, that combines the techniques of mRMR (minimal redundancy maximal relevance) and bootstrapping, for discover of non-redundant biomarkers for microbial communities through mining of metagenomic data. MetaBoot has been tested and compared with other methods on well-designed simulated datasets considering normal and gamma distribution as well as publicly available metagenomic datasets. Results have shown that MetaBoot was robust across datasets of varied complexity and taxonomical distribution patterns and could also select discriminative biomarkers with quite high accuracy and biological consistency. Thus, MetaBoot is suitable for robustly and accurately discover taxonomical biomarkers for different microbial communities.

Highlights

  • The approximate estimation of microbial cells on earth is 1030 (Proctor, 1994), which is huge, and a large number of novel genes with useful functions might be contained within the genomes of these unknown communities of microbes

  • We have first analyzed taxonomical distribution properties of real metagenomic samples, and generated sets of synthetic datasets with known ground truth biomarkers and distribution properties learned from real data

  • A simulated synthetic dataset could contain such “ground truth,” simulating taxonomical distribution properties of real metagenomic samples is critical for the validity of such synthetic dataset

Read more

Summary

Introduction

The approximate estimation of microbial cells on earth is 1030 (Proctor, 1994), which is huge, and a large number of novel genes with useful functions might be contained within the genomes of these unknown communities of microbes. Based on the development of Generation Sequencing (NGS), the metagenomic method become one of the important methods that could provide direct access to genomes of as-yet-uncultivated microorganisms in native environments (Eisen, 2007). Metagenomics makes it possible to better understand microbial diversity as well as their functions. Metagenomics has become an increasingly popular research area when its diverse and multiplicity of metagenomics and its potential applications in environmental sciences, bioenergy and human health is considered

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call