A Study on Text Clustering Algorithms Based on Frequent Term Sets

Xiangwei Liu,Pilian He

doi:10.1007/11527503_42

Abstract

In this paper, a new text-clustering algorithm named Frequent Term Set-based Clustering (FTSC) is introduced. It uses frequent term sets to cluster texts. First, it extracts useful information from documents and inserts into databases. Then, it uses the Apriori algorithm based on association rules mining efficiently to discover the frequent items sets. Finally, it clusters the documents according to the frequent words in subsets of the frequent term sets. This algorithm can reduce the dimension of the text data efficiently for very large databases, thus it can improve the accuracy and speed of the clustering algorithm. The results of clustering texts by the FTSC algorithm cannot reflect the overlap of texts' classes. Based on the FTSC algorithm, an improved algorithm—Frequent Term Set-based Hierarchical Clustering algorithm (FTSHC) is given. This algorithm can determine the overlap of texts' classes by the overlap of the frequent words sets, and provide an understandable description of the discovered clusters by the frequent terms sets. The FTSC, FTSHC and K-Means algorithms are evaluated quantitatively by experiments. The results of the experiments prove that FTSC and FTSHC algorithms are more efficient than K-Means algorithm in the performance of clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Study on Text Clustering Algorithms Based on Frequent Term Sets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The research of text clustering algorithms based on frequent term sets
Xiang-Wei Liu ... Hui-Ying Wang
-
Xiang-Wei Liu, et. al. Xiang-Wei Liu ... Hui-Ying Wang
01 Jan 2004
01 Jan 2004

Study on frequent term set-based hierarchical clustering algorithm
Huiying Wang ... Xiangwei Liu
-
Huiying Wang, et. al.Huiying Wang ... Xiangwei Liu
01 Jul 2011
01 Jul 2011

Text clustering approach based on maximal frequent term sets
Chong Su ... Xiaolong Wang
-
Chong Su, et. al.Chong Su ... Xiaolong Wang
01 Oct 2009
01 Oct 2009

A Research about Frequent Contextual Termset
...
-
, et. al. ...
11 May 2012
11 May 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study on Text Clustering Algorithms Based on Frequent Term Sets

Abstract

Talk to us

Similar Papers