Abstract
A huge quantity of microbiome samples have been accumulated, and more are yet to come from all niches around the globe. With the accumulation of data, there is an urgent need for comparisons and searches of microbiome samples among thousands of millions of samples in a fast and accurate manner. However, it is a very difficult computational challenge to identify similar samples, as well as identify their likely origins, among such a grand pool of samples from all around the world. Currently, several approaches have already been proposed for such a challenge, based on either distance calculation, unsupervised algorithms, or supervised algorithms. These methods have advantages and disadvantages for the different settings of comparisons and searches, and their results are also drastically different. In this review, we systematically compared distance-based, unsupervised, and supervised methods for microbiome sample comparison and search. Firstly, we assessed their accuracy and efficiency, both in theory and in practice. Then we described the scenarios in which one or multiple methods were applicable for sample searches. Thirdly, we provided several applications for microbiome sample comparisons and searches, and provided suggestions on the choice of methods. Finally, we provided several perspectives for the future development of microbiome sample comparison and search, including deep learning technologies for tracking the sources of microbiome samples.
Highlights
Microbiome samples are accumulating at an accelerating rate, representing microbial communities from every niche of the human body as well as other host organisms, environments, and ecological biomes (Mitchell et al, 2020; Figure 1)
library-independent methods (LIMs) can perform well when source tracking with thousands of samples and hundreds of biomes, but it is difficult for librarydependent methods (LDMs) to deal with such situations due to limitations of accuracy and efficiency
Though unsupervised methods are accurate for microbiome sample comparison and searches, it is easy to think of modelbased methods as solving the same problem with higher accuracy and speed
Summary
Microbiome samples are accumulating at an accelerating rate, representing microbial communities from every niche (biome) of the human body as well as other host organisms, environments, and ecological biomes (Mitchell et al, 2020; Figure 1). LDMs and LIMs can both achieve good performance for MST with a small number of microbial community samples (usually from a handful to dozens of samples) and a few biomes (usually no more than 10 biomes). LIMs can perform well when source tracking with thousands of samples and hundreds of biomes, but it is difficult for LDMs to deal with such situations due to limitations of accuracy and efficiency
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have