Abstract

ABSTRACTReproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies. The complex nature of microbiome data, which are high-dimensional, zero-inflated, and compositional, makes them challenging to analyze, as they often violate assumptions of classic statistical methods. With advances in human microbiome research, research questions and study designs increase in complexity so that more sophisticated data analysis concepts are applied. To improve current practice of the analysis of microbiome studies, it is important to understand what kind of research questions are asked and which tools are used to answer these questions. We conducted a systematic literature review considering all publications focusing on the analysis of human microbiome data from June 2018 to June 2019. Of 1,444 studies screened, 419 fulfilled the inclusion criteria. Information about research questions, study designs, and analysis strategies were extracted. The results confirmed the expected shift to more advanced research questions, as one-third of the studies analyzed clustered data. Although heterogeneity in the methods used was found at any stage of the analysis process, it was largest for differential abundance testing. Especially if the underlying data structure was clustered, we identified a lack of use of methods that appropriately addressed the underlying data structure while taking into account additional dependencies in the data. Our results confirm considerable heterogeneity in analysis strategies among microbiome studies; increasingly complex research questions require better guidance for analysis strategies.IMPORTANCE The human microbiome has emerged as an important factor in the development of health and disease. Growing interest in this topic has led to an increasing number of studies investigating the human microbiome using high-throughput sequencing methods. However, the development of suitable analytical methods for analyzing microbiome data has not kept pace with the rapid progression in the field. It is crucial to understand current practice to identify the scope for development. Our results highlight the need for an extensive evaluation of the strengths and shortcomings of existing methods in order to guide the choice of proper analysis strategies. We have identified where new methods could be designed to address more advanced research questions while taking into account the complex structure of the data.

Highlights

  • Reproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies

  • Issues in microbiome data prohibit the use of classic statistical methods, especially methods designed for low-dimensional data that make specific assumptions about the data, which do not hold in the microbiome context

  • Microbiome data obtained by 16S rRNA amplicon or shotgun metagenomic sequencing are high dimensional, with thousands of taxa present

Read more

Summary

Introduction

Reproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies. IMPORTANCE The human microbiome has emerged as an important factor in the development of health and disease Growing interest in this topic has led to an increasing number of studies investigating the human microbiome using high-throughput sequencing methods. We have identified where new methods could be designed to address more advanced research questions while taking into account the complex structure of the data. Microbiome data are sparse because specific taxa are either not present in some samples (structural zeros) or are not detected due to low abundance (technical zeros) This is especially problematic because microbiome data are compositional and add up to a fixed overall read number [6,7,8], which in itself is variable and mainly determined by technical issues and not the true quantity of microbiota in the original sample. While many early publications focused on the characterization of different parts of the human microbiome in healthy individuals or in the context of diseases, recent publications focus on more distinguished links between the microbiome and diseases, e.g., the detection of predictive biomarkers that may enable early diagnosis of diseases or the effect of a disease on the development of the microbiome over time

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call