Abstract

High throughput genomics technologies are applied widely to microbiomes in humans, animals, soil and water, to detect changes in bacterial communities or the genes they carry, between different environments or treatments. We describe a method to test the statistical significance of differences in bacterial population or gene composition, applicable to metagenomic or quantitative polymerase chain reaction data. Our method goes beyond previous published work in being universally most powerful, thus better able to detect statistically significant differences, and through being more reliable for smaller sample sizes. It can also be used for experimental design, to estimate how many samples to use in future experiments, again with the advantage of being universally most powerful. We present three example analyses in the area of antimicrobial resistance. The first is to published data on bacterial communities and antimicrobial resistance genes (ARGs) in the environment; we show that there are significant changes in both ARG and community composition. The second is to new data on seasonality in bacterial communities and ARGs in hooves from four sheep. While the observed differences are not significant, we show that a minimum group size of eight sheep would provide sufficient power to observe significance of similar changes in further experiments. The third is to published data on bacterial communities surrounding rice crops. This is a much larger data set and is used to verify the new method. Our method has broad uses for statistical testing and experimental design in research on changing microbiomes, including studies on antimicrobial resistance.

Highlights

  • Bacteria live in complex communities, whether in water, in soil, or on larger organisms, as the microbiota of organs such as the gut or the skin

  • We refer to the proportions of individual taxanomic or gene classes within the population; this can be contrasted with abundance, which refers to the number of individuals, either overall, or of specific taxanomic or gene classes, or overall structure, which takes into account both abundance and composition

  • La Rosa et al model entire populations of metagenomic data according to a Dirichlet-multinomial distribution; surprisingly, their method uses Wald’s test to determine p-values, which is less reliable than likelihood ratio tests or scores tests, especially for smaller sample sizes (n < 10), which are common in very many studies[20]

Read more

Summary

Introduction

Bacteria live in complex communities, whether in water, in soil, or on larger organisms, as the microbiota of organs such as the gut or the skin. Methods such as taking diversity indices or principal coordinate analysis[11] have allowed for a more in depth analysis of population structures than using pie charts/compositional bar charts. Methods developed for cDNA library[15] or RNA-seq analysis[16,17,18], though not commonly used in this context, could be applied to metagenomics data on a taxon-by-taxon or gene-class-by-gene-class basis to identify individual taxa or gene classes that are significantly different While such analyses could be of value, they do not answer the question about whether the overall community has changed. It is long established that likelihood ratio tests are more generally appropriate than Wald’s test, because they have the property of being universally most powerful: for a fixed false positive error rate, likelihood ratio tests provide the smallest rate of false negative errors

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call