Abstract

Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of zeros, non-negative skewed counts, and frequently, limited number of samples. These features distinguish sequence data from other forms of high-dimensional data, and are not adequately addressed by statistical approaches in common use. Ultimately, medical studies may identify targeted interventions or treatments, but lack of analytic tools for feature selection and identification of taxa responsible for differences between groups, is hindering advancement. The objective of this paper is to examine the application of a two-part statistic to identify taxa that differ between two groups. The advantages of the two-part statistic over common statistical tests applied to sequence count datasets are discussed. Results from the t-test, the Wilcoxon test, and the two-part test are compared using sequence counts from microbial ecology studies in cystic fibrosis and from cenote samples. We show superior performance of the two-part statistic for analysis of sequence data. The improved performance in microbial ecology studies was independent of study type and sequence technology used.

Highlights

  • IntroductionThe small subunit ribosomal RNA gene (SSU-rRNA), is widely used to examine microbial ecology

  • Analysis of sequence variants, the small subunit ribosomal RNA gene (SSU-rRNA), is widely used to examine microbial ecology

  • To identify taxa that differ between two groups, we propose the use of a two-part statistic, as it is capable of handling the complexities in the distribution of sequence count data

Read more

Summary

Introduction

The small subunit ribosomal RNA gene (SSU-rRNA), is widely used to examine microbial ecology. Sequencing methods are used to generate data in several areas of human health and across diverse ecological studies. This DNA based method for bacterial identification has many advantages over culture-based methods and provides the ability to identify organisms without a priori knowledge of the community present [2,3]. Typical data generated from a microbial ecology study consist of SSU-rRNA gene sequence variant counts. These variants serve as a proxy for the diversity and relative abundance of the microbial populations in the community. Sequences are classified based on relationship to exemplar sequences, which provides taxonomic information about the organism that contributed the template DNA [4]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.