Abstract

Motivation: Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. Results: We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. Availability and implementation: metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix Contact: sofia.morfopoulou.10@ucl.ac.uk Supplementary information: Supplementary data are available at Bionformatics online.

Highlights

  • Metagenomics can be defined as the analysis of a collection of DNA or RNA sequences originating from a single sample

  • The discovery of viral pathogens is clearly relevant for clinical practice (Fancello et al, 2012), (Chiu, 2013)

  • We demonstrate its potential using datasets from clinical samples as well as benchmark metagenomic datasets

Read more

Summary

Introduction

Metagenomics can be defined as the analysis of a collection of DNA or RNA sequences originating from a single sample. Its scope is broad and includes the analysis of a diverse set of samples such as gut microbiome (Qin et al, 2010), (Minot et al, 2011), environmental (Mizuno et al, 2013) or clinical (Willner et al, 2009), (Negredo et al, 2011), (McMullan et al, 2012) samples Among these applications, the discovery of viral pathogens is clearly relevant for clinical practice (Fancello et al, 2012), (Chiu, 2013). Part of the difficulty stems from the read length limitation of existing deep DNA sequencing technologies, an issue compounded by the extensive level of homology across viral and bacterial species.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call