Recent studies have highlighted the importance of human microbiota in our health and diseases. However, in many areas of research, individual microbiome studies often offer inconsistent results due to the limited sample sizes and the heterogeneity in study populations and experimental procedures. This inconsistency underscores the necessity for integrative analysis of multiple microbiome datasets. Despite the critical need, statistical methods that incorporate multiple microbiome datasets and account for the study heterogeneity are not available in the literature. In this paper, we develop a mixed effect similarity matrix regression (SMRmix) approach for identifying community level microbiome shifts between outcomes. SMRmix has a close connection with the microbiome kernel association test, one of the most popular approaches for such a task but is only applicable when we have a single study. SMRmix enables researchers to consolidate findings from diverse microbiome studies. Via extensive simulations, we show that SMRmix has well-controlled type I error and higher power than some potential competitors. We applied the SMRmix to two real-world datasets. The first, from the HIV-reanalysis consortium, integrated data from 17 studies on gut dysbiosis in HIV. Our analysis confirmed consistent associations between the gut microbiome and HIV infection as well as MSM (men who have sex with men) status, demonstrating greater power than competing methods. The second dataset involved 11 studies on the gut microbiome in colorectal cancer; analysis with SMRmix confirmed significant dysbiosis in affected individuals compared to healthy controls. The development of SMRmix enables the integration of multiple studies and effectively managing study heterogeneity, and provides a powerful tool for uncovering consistent associations between diseases and community-level microbiome data.
Read full abstract