Knowledge-based vector space model for text clustering

Liping Jing,Joshua Z Huang,Michael K Ng

doi:10.1007/s10115-009-0256-5

Abstract

This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge-based vector space model for text clustering

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Oct 2, 2009
Citations: 111

Similar Papers

Research of text clustering based on improved VSM by TF under the framework of Mahout
Cao Langcai ... Li Zhihui
-
Cao Langcai, et. al.Cao Langcai ... Li Zhihui
01 May 2017
01 May 2017

Novel similarity measure for document clustering based on topic phrases
A E Eldesoky ... M Saleh
-
A E Eldesoky, et. al.A E Eldesoky ... M Saleh
01 Mar 2009
01 Mar 2009

New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps
Walaa K Gad ... Mohamed S Kamel
-
Walaa K Gad, et. al.Walaa K Gad ... Mohamed S Kamel
01 Jan 2009
01 Jan 2009

An Analytical Assessment on Document Clustering
Pushplata ... Ram Chatterjee
International Journal of Computer Network and Information Security | VOL. 4
Pushplata , et. al.Pushplata ... Ram Chatterjee
01 Jun 2012
International Journal of Computer Network and Information Security | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-based vector space model for text clustering

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems