VTwins: inferring causative microbial features from metagenomic data of limited samples

Qingren Meng,Qian Zhou,Shuo Shi,Jingfa Xiao,Qin Ma,Jun Yu,Jun Chen,Yu Kang

doi:10.1016/j.scib.2023.10.024

Abstract

It is difficult to infer causality from high-dimension metagenomic data due to interference from numerous confounders. By imitating the twin studies in genetic research, we develop a straightforward method—virtual twins (VTwins)—to eliminate the confounder effects by transforming the original cohort into a paired cohort of “Twin” samples with distinct phenotypes but matched taxonomic profiles. The results show that VTwins outperforms the conventional approach in the sensitivity of identifying causative features and only requires a 10-fold reduced sample size for recalling disease-associated microbes or pathways, as tested by simulated and empirical data. Benchmark test with other 16 kinds of software further validates the power and applicability of VTwins for handling high-dimension compositional datasets and mining causalities in metagenomic research. In conclusion, VTwins is straightforward and effective in handling high-diversity, high-dimension compositional data, promising applications in mining causalities for metagenomic and potentially other omics data. VTwins is open access and available at https://github.com/mengqingren/VTwins.

Full Text