A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Liping Jing ,Xinhua Yang ,Michael K Ng ,Joshua Zhexue Huang

doi:10.5281/zenodo.1057143

Abstract

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters. Keywords—Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Zenodo (CERN European Organization for Nuclear Research)	Publication Date: Apr 26, 2008
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Abstract

Talk to us

Similar Papers

More From: Zenodo (CERN European Organization for Nuclear Research)

Lead the way for us

Similar Papers

Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm
Liping Jing ... Jun Xu
-
Liping Jing, et. al.Liping Jing ... Jun Xu
01 Jan 2004
01 Jan 2004

Grouping points by shared subspaces for effective subspace clustering
Ye Zhu ... Mark J Carman
Pattern Recognition | VOL. 83
Ye Zhu, et. al.Ye Zhu ... Mark J Carman
31 May 2018
Pattern Recognition | VOL. 83

Dimensionality-reduced subspace clustering
Reinhard Heckel ... Michael Tschannen
Information and Inference: A Journal of the IMA | VOL. 6
Reinhard Heckel, et. al.Reinhard Heckel ... Michael Tschannen
14 Mar 2017
Information and Inference: A Journal of the IMA | VOL. 6

Why Subspace Clustering Works on Compressed Data?
Linghang Meng ... Yuantao Gu
-
Linghang Meng, et. al.Linghang Meng ... Yuantao Gu
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Abstract

Talk to us

Similar Papers

More From: Zenodo (CERN European Organization for Nuclear Research)