Abstract

Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.

Highlights

  • Thanks to the possibility to characterize microbial communities through generation sequencing, microbial ecology has become a central topic in many environmental and therapeutic applications

  • For a large number of therapeutic and diagnostic applications it would be essential to identify and rank the microbial taxa that are most relevant in these comparisons

  • PhyloRelief can be applied both to cases in which sequences can be classified according to a known taxonomy, and to cases in which this is not feasible, a common occurrence in metagenomic data analysis given the increasing number of new and uncultivable taxa that are discovered using these technologies

Read more

Summary

Introduction

Thanks to the possibility to characterize microbial communities through generation sequencing, microbial ecology has become a central topic in many environmental and therapeutic applications. A correlation between imbalances or abnormal composition of the gut microbiota and a number of pathologic conditions has been proposed These alterations might be due to therapeutic interventions, like antibiotic treatment [8], or different lifestyle [9]. Alternatives for bioremediation of microbiota alterations is the supplementation of pro- or prebiotics, while it has been suggested that antibiotic treatment and vaccination can be used to guide the structure of the gut microbiota towards a status that is compatible with health [12,13] Most of these intervention strategies would greatly increase their efficacy using a precise definition of the microbial species that are differentially distributed in health and disease conditions. The low abundance of most microbial taxa in metagenomic samples poses additional challenges only recently tackled with statistical methods [15]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call