The experimental methods employed during metagenomic sequencing analyses of microbiome samples significantly impact the resulting data and typically vary substantially between laboratories. In this study, a full factorial experimental design was used to compare the effects of a select set of methodological choices (sample, operator, lot, extraction kit, variable region, and reference database) on the analysis of biologically diverse stool samples. For each parameter investigated, a main effect was calculated that allowed direct comparison both between methodological choices (bias effects) and between samples (real biological differences). Overall, methodological bias was found to be similar in magnitude to real biological differences while also exhibiting significant variations between individual taxa, even between closely related genera. The quantified method biases were then used to computationally improve the comparability of data sets collected under substantially different protocols. This investigation demonstrates a framework for quantitatively assessing methodological choices that could be routinely performed by individual laboratories to better understand their metagenomic sequencing workflows and to improve the scope of the datasets they produce.IMPORTANCEMethod-specific bias is a well-recognized challenge in metagenomic sequencing characterization of microbiome samples, but rigorous bias quantification is challenging. This report details a full factorial exploration of 48 experimental protocols by systematically varying microbiome sample, iterations of material production, laboratory personnel, DNA extraction kit, marker gene selection, and reference databases. Quantification of the biases associated with each parameter revealed similar magnitudes of variation arising from real biological differences and from varied analysis procedures. Furthermore, these measurement biases varied substantially with taxa, even between closely related genera. However, computational correction of method bias using a reference material was demonstrated that significantly harmonized metagenomic sequencing results collected using different analysis protocols.
Read full abstract