Ensemble subspace clustering of text data using two-level features

He Zhao,Salman Salloum,Joshua Zhexue Huang,Yeshou Cai

doi:10.1007/s13042-016-0556-5

Abstract

This paper proposes a new integrated method for ensemble subspace clustering of high dimensional sparse text data. Our method employs two-level feature representation of text data (words and topics) to generate clusters from subspaces. We also use ensemble clustering to increase the robustness of the clusters. This method depends on topic modeling to get the two-level feature representation of text data and to generate different ensemble components. By using both topics and words to cluster text data, we can get more interpretable clusters as we can measure the weight of words and topics in each cluster. In order to evaluate the proposed method, we have conducted several experiments on seven real-life data sets. While some of these data sets are easy to cluster, others are hard, and some others contain unbalanced data. Experimental results on this diversity of data sets show that our method outperforms other methods for ensemble clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ensemble subspace clustering of text data using two-level features

Abstract

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics

Lead the way for us

Journal: International Journal of Machine Learning and Cybernetics	Publication Date: Jun 17, 2016
Citations: 3

Similar Papers

Stratified feature sampling method for ensemble clustering of high dimensional data
Liping Jing ... Joshua Z Huang
Pattern Recognition | VOL. 48
Liping Jing, et. al.Liping Jing ... Joshua Z Huang
13 May 2015
Pattern Recognition | VOL. 48

Ensemble Clustering of High Dimensional Data with FastMap Projection
Imran Khan ... Graham Williams
-
Imran Khan, et. al.Imran Khan ... Graham Williams
01 Jan 2014
01 Jan 2014

A feature grouping method for ensemble clustering of high-dimensional genomic big data
Dewan Md Farid ... Ann Nowe
-
Dewan Md Farid, et. al.Dewan Md Farid ... Ann Nowe
01 Dec 2016
01 Dec 2016

FastMap in dimensionality reduction: ensemble clustering of high dimensional data
Imran Khan ... Joshua Z Huang
International Journal of Data Science | VOL. 2
Imran Khan, et. al.Imran Khan ... Joshua Z Huang
01 Jan 2017
International Journal of Data Science | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble subspace clustering of text data using two-level features

Abstract

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics