Abstract

BackgroundThe comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best.ResultsWe applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical distances. Results showed that the measure can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/.ConclusionsThe k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model.

Highlights

  • The comparison of microbial communities is crucial for understanding how environment factors affect the composition and the function of the communities [1]

  • The real data are used to analyze the effectiveness of k-tuple based sequence signature measures for the comparison of microbial community samples

  • The data are from different geographic locations including Hawaiian Ocean, Mexican Gulf, California Gulf, Norwegian Fjord, North Atlantic ocean, South Pacific ocean, Western English Channel and Eastern Equatorial Atlantic Ocean mixed with Amazon river plume

Read more

Summary

Introduction

The comparison of microbial communities is crucial for understanding how environment factors affect the composition and the function of the communities [1]. In metatranscriptome studies, RNA sequences are sampled from the communities and the expression levels of various RNA molecules can be estimated. For both metagenomic and metatranscriptomic data, alignment-based approaches for the comparison of communities may not be applicable because the reads can be sampled from different parts of the genomes or RNAs of the various organisms. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples It is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.