Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype.

Wontack Han,Yuzhen Ye,Haixu Tang

doi:10.1089/cmb.2021.0640

Abstract

Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: May 17, 2022
Citations: 2	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Similar Papers

A repository of microbial marker genes related to human health and diseases for host phenotype prediction using microbiome data
Wontack Han ... Yuzhen Ye
-
Wontack Han, et. al.Wontack Han ... Yuzhen Ye
01 Nov 2018
01 Nov 2018

MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets
Bo Liu ... Mihai Pop
BMC Proceedings | VOL. 5
Bo Liu, et. al.Bo Liu ... Mihai Pop
28 Apr 2011
BMC Proceedings | VOL. 5

Protein abundance in multiplexed samples (PAMUS) for quantitation of Trichoderma reesei secretome
Sunil S Adav ... Siu Kwan Sze
Journal of Proteomics | VOL. 83
Sunil S Adav, et. al.Sunil S Adav ... Siu Kwan Sze
04 Apr 2013
Journal of Proteomics | VOL. 83

Comparison of neural network, Bayesian, and multiple stepwise regression-based limited sampling models to estimate area under the curve.
Chee M Ng
Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy | VOL. 23
Chee M NgChee M Ng
01 Aug 2003
Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology