Microbiome Sample Comparison and Search: From Pair-Wise Calculations to Model-Based Matching.

Yuguo Zha,Kang Ning,Hui Chong

doi:10.3389/fmicb.2021.642439

Yuguo Zha, Kang Ning + Show 1 more

Open Access

PDF Available

https://doi.org/10.3389/fmicb.2021.642439

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

A huge quantity of microbiome samples have been accumulated, and more are yet to come from all niches around the globe. With the accumulation of data, there is an urgent need for comparisons and searches of microbiome samples among thousands of millions of samples in a fast and accurate manner. However, it is a very difficult computational challenge to identify similar samples, as well as identify their likely origins, among such a grand pool of samples from all around the world. Currently, several approaches have already been proposed for such a challenge, based on either distance calculation, unsupervised algorithms, or supervised algorithms. These methods have advantages and disadvantages for the different settings of comparisons and searches, and their results are also drastically different. In this review, we systematically compared distance-based, unsupervised, and supervised methods for microbiome sample comparison and search. Firstly, we assessed their accuracy and efficiency, both in theory and in practice. Then we described the scenarios in which one or multiple methods were applicable for sample searches. Thirdly, we provided several applications for microbiome sample comparisons and searches, and provided suggestions on the choice of methods. Finally, we provided several perspectives for the future development of microbiome sample comparison and search, including deep learning technologies for tracking the sources of microbiome samples.

Highlights

Microbiome samples are accumulating at an accelerating rate, representing microbial communities from every niche of the human body as well as other host organisms, environments, and ecological biomes (Mitchell et al, 2020; Figure 1)
library-independent methods (LIMs) can perform well when source tracking with thousands of samples and hundreds of biomes, but it is difficult for librarydependent methods (LDMs) to deal with such situations due to limitations of accuracy and efficiency
Though unsupervised methods are accurate for microbiome sample comparison and searches, it is easy to think of modelbased methods as solving the same problem with higher accuracy and speed

Summary

Introduction

Microbiome samples are accumulating at an accelerating rate, representing microbial communities from every niche (biome) of the human body as well as other host organisms, environments, and ecological biomes (Mitchell et al, 2020; Figure 1). LDMs and LIMs can both achieve good performance for MST with a small number of microbial community samples (usually from a handful to dozens of samples) and a few biomes (usually no more than 10 biomes). LIMs can perform well when source tracking with thousands of samples and hundreds of biomes, but it is difficult for LDMs to deal with such situations due to limitations of accuracy and efficiency

Methods

Results

Conclusion