Abstract

Comparing metagenomic samples is a critical step in understanding the relationships among microbial communities. Recently, next-generation sequencing (NGS) technologies have produced a massive amount of short reads data for microbial communities from different environments. The assembly of these short reads can, however, be time-consuming and challenging. In addition, alignment-based methods for metagenome comparison are limited by incomplete genome and/or pathway databases. In contrast, alignment-free methods for metagenome comparison do not depend on the completeness of genome or pathway databases. Still, the existing alignment-free methods, and , which model k-tuple patterns using only one Markov chain for each sample, neglect the heterogeneity within metagenomic data wherein potentially thousands of types of microorganisms are sequenced. To address this imperfection in and , we organized NGS sequences into different reads bins and constructed several corresponding Markov models. Next, we modified the definition of our previous alignment-free methods, and , to make them more compatible with a scheme of analysis which uses the proposed reads bins. We then used two simulated and three real metagenomic datasets to test the effect of the k-tuple size and Markov orders of background sequences on the performance of these de novo alignment-free methods. For dependable comparison of metagenomic samples, our newly developed alignment-free methods with reads binning outperformed alignment-free methods without reads binning in detecting the relationship among microbial communities, including whether they form groups or change according to some environmental gradients.

Highlights

  • Understanding the impact of environmental factors on the composition of microbial communities, along with the effects of microbes on their hosts, is a crucial problem in microbiological studies

  • The genomes discovered before May 2015 were used for training the Markov models for reads binning

  • We investigated the applications of the new alignmentfree measures to compare metagenomic samples

Read more

Summary

Introduction

Understanding the impact of environmental factors on the composition of microbial communities, along with the effects of microbes on their hosts, is a crucial problem in microbiological studies. Metagenomic data provide the whole genetic information from microbial communities. A metagenomic sample usually contains millions of short reads, consisting of several hundred of base pairs, and each read is randomly sampled from a genomic region of a microbial genome in the community. Given the massive amount of metagenomic data, computational methods are in great demand to infer the relationships between microbes and environmental factors/hosts. Quantifying the similarities and differences among microbial communities from multiple environments/hosts is one of the most important steps in metagenomic data analysis

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.