Exploiting topic modeling to boost metagenomic reads binning.

Ruichang Zhang,Shuigeng Zhou,Jihong Guan,Zhanzhan Cheng

doi:10.1186/1471-2105-16-s5-s2

Ruichang Zhang, Shuigeng Zhou + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-16-s5-s2

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Mar 18, 2015
Citations: 36	License type: cc-by

Affiliation: Fudan University, Tongji University

Abstract

BackgroundWith the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data.ResultsIn this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions.ConclusionsExperiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.

Highlights

Due to the limitations of biological experiments, traditional microbial genomic studies focus on individual bacterium genomes
The series of MetaCluster algorithms can automatically determine the number of clusters, which is extremely important for binning of metagenomic reads as most samples are from unknown species in real datasets
The proposed method TM-MCluster consists of three major steps: 1) representing each read as a vector of k-mers with occurring frequencies; 2) transforming each read vector to a topic distribution vector based on the Latent Dirichlet Allocation (LDA) model [15]; 3) clustering the vectorized reads by the SKWIC algorithm [16], as in the MCluster method [14]

Summary

Results

We propose a new method TM-MCluster for binning metagenomic reads. We represent each metagenomic read as a set of “k-mers” with their frequencies occurring in the read. We employ a probabilistic topic model – the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden “topics” such that each read can be represented by a distribution vector of the generated topics. As in the MCluster method, we apply SKWIC – a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions

Conclusions

Introduction

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting topic modeling to boost metagenomic reads binning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Automated Theme Search in ICO Whitepapers
Fu Chuanjie ... Andrew Koh
The Journal of Financial Data Science | VOL. 1
Fu Chuanjie, et. al.Fu Chuanjie ... Andrew Koh
04 Sep 2019
The Journal of Financial Data Science | VOL. 1

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
Jiayin Wei ... Yongbin Qin
-
Jiayin Wei, et. al.Jiayin Wei ... Yongbin Qin
01 Jan 2013
01 Jan 2013

Models, Inference, and Implementation for Scalable Probabilistic Models of Text

-

01 Jan 2014
01 Jan 2014

Tourism Activity Recognition and Discovery Based on Improved LDA Model
Yifan Yuan ... Jangmyung Lee
-
Yifan Yuan, et. al.Yifan Yuan ... Jangmyung Lee
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting topic modeling to boost metagenomic reads binning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics