Abstract

The human microbiome is comprised of thousands of microbial species. These species will substantially influence the normal physiology of humans and cause numerous diseases. Microbiome data can be measured by sequencing, microarray, or other technologies. With the fast development of these technologies, downstream analysis methods should also be designed to effectively and accurately discover the valuable information that is hidden in the data. Many methods have been designed for the count data of microbiome. However, to our knowledge, there are only a few methods developed for the continuous microbiome data. Many microbiome data have an over-dispersed and zero-inflated data structure. Traditional methods rarely characterize this data structure and only focus on the differences in the abundance between different samples. In this study, we introduce a novel method, the zero-inflated gamma (ZIG) omnibus test, to specifically test the continuous and zero-inflated microbiome data. In this test, abundance will be tested along with zero prevalence and dispersion. We compared this method with five other popular methods. We found that ZIG omnibus test has significantly higher power and a similar or lower false-positive rate than the competing methods in the tests of simulated data. It also found more proved microbiomes in the real data with tonsil cancer. So, we conclude that ZIG omnibus test is a robust method across various biological conditions in the differential expression test of microbiome data.

Highlights

  • The human body harbors more than 1 trillion microbes, which are proved to contribute to the host’s health and growth

  • We proposed an omnibus test of all three parameters - prevalence, abundance, and dispersion, to identify associated microbial taxa with a zero-inflated gamma (ZIG) model

  • In the unique signature species detected by ZIG omnibus test, we find some Streptococcus and Staphylococcus species in the cancer group

Read more

Summary

INTRODUCTION

The human body harbors more than 1 trillion microbes, which are proved to contribute to the host’s health and growth. To improve the robustness of model, dispersion should be considered as a covariate-dependent parameter like abundance and prevalence In this case, we proposed an omnibus test of all three parameters - prevalence, abundance, and dispersion, to identify associated microbial taxa with a zero-inflated gamma (ZIG) model. We proposed an omnibus test of all three parameters - prevalence, abundance, and dispersion, to identify associated microbial taxa with a zero-inflated gamma (ZIG) model Since these parameters characterize the entire data distribution, our method can detect the general differential distribution beyond the previous differential abundance analysis (see below), resulting in more accurate and valuable results. We compared the powers and type I errors from ZIG omnibus test to five common methods using simulated data with the same or different abundance, prevalence, and/or dispersion. We showed that our proposed framework is robust across various data structure/biological conditions and more powerful than the competing methods

ZERO-INFLATED GAMMA MODEL
MODEL FITTING
REAL DATA APPLICATIONS
Findings
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call