Abstract
Data contamination in meta-approaches where multiple biological samples are combined considerably affects the results of subsequent downstream analyses, such as differential abundance tests comparing multiple groups at a fixed time point. Little has been thoroughly investigated regarding the impact of the lurking variable of various batch sources, such as different days or different laboratories, in more complicated time series experimental designs, for instance, repeatedly measured longitudinal data and metadata. We highlight that the influence of batch factors is significant on subsequent downstream analyses, including longitudinal differential abundance tests, by performing a case study of microbiome time course data with two treatment groups and a simulation study of mimic microbiome longitudinal counts.
Highlights
The microbiome, referred to as “the entire habitat, including the microorganisms, their genomes, and the surrounding environmental conditions” [1], plays an important role in the host physiology, nutrition and development
There has been achieved the integrated longitudinal data with 90 samples in terms of two different trials with two different primersets. This large-scale meta-longitudinal microbiome data, where samples are integrated with two different trials, sequenced by distinct primer-sets (V3/V4 versus V1/V3), have been explored to determine whether the known batch factor of primersets is statistically significant by using the exploratory tools of guided PCA [23]
We examined whether there were any significant systematic biases due to the integrated samples of metadata by different days with different primer sets prior to detection of time-varying group differences in longitudinal differential abundance tests (Movie 1-(D))
Summary
The microbiome, referred to as “the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eukaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions” [1], plays an important role in the host physiology, nutrition and development. There are two unique features of the longitudinal (time-series) study design— i.e., the fact that time imposes an inherent, irreversible ordering on samples; and the fact that samples exhibit statistical dependencies that are a function of time—that make this the ideal method to understand the structure and function of the microbiome. The longitudinal design often suffers from irregular sampling intervals and missing data. We fully discuss the critical batch issues [17,19,20,21,22,23,24,25] that have emerged in the microbial community in the comprehensive analyses of both large-scale integrated longitudinal microbiome data and simulation studies
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.