High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic

Fengxia Yang

doi:10.2991/iccia.2012.313

Abstract

Existing data clustering method lacks considering of latent similar information existing among words,and it leads to unsatisfactory clustering result.Aiming at Chinese short message text clustering,this paper proposes a clustering algorithm based on semantic.It offers Chinese concept,and the measuring methods to calculate the similarity degree about words and Chinese short message text.It completes the clustering of Chinese short messages text through fission downwards and mergence of twos upwards.Experimental results show that this algorithm has better clustering quality than traditional algorithm. Text clustering is an un-supervising machinery learning. By analyzing the text content, the text shall be divided into many meaningful classifications, in which the similarity of the same classification shall become as high as possible, and the similarity of the different classification shall become as low as possible. Now, the common text clustering algorithms are mainly hierarchical clustering method represented by G-HAC algorithm and flat division method represented by K-means algorithm. There are many achievements on text clustering at home and abroad. For example, text clustering algorithm based on semantic filtering model in literature(1); text clustering algorithm based on fuzzy concepts in literature(2); text clustering algorithm based on swarming intelligence Web in literature(3); text clustering algorithm based on semantic inner space in literature(4); achieving a high efficient text clustering algorithm by the chain fission downward and the two-two merging upward, based on the up-down relationship of primitive, constructing a primitive concept tree in literature(2) and so on. In literature(6) based on HowNet model, the author put forward a similarity calculation algorithm, but this algorithm only can apply to the similarity calculation between words and concepts and does not provide the text similarity calculation analysis. This article analyzes the text from the perspective of semantics, making semantic disambiguation firstly(7), expressing the texts as a keyword set, calculating the similarity of words with the similarity of non-weak primitives, and calculating the similarity of texts with the similarity of words. This algorithm analyzes the similarity among texts from the perspective of semantics, so the results better fit for people's institution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic
Feng Xia Yang
Advanced Materials Research | VOL. 756-759
Feng Xia YangFeng Xia Yang
01 Sep 2013
Advanced Materials Research | VOL. 756-759

Short Text Feature Extraction and Clustering for Web Topic Mining
Hui He ... Jun Guo
-
Hui He, et. al.Hui He ... Jun Guo
01 Oct 2007
01 Oct 2007

Effects on Time and Quality of Short Text Clustering during Real-Time Presentations
Diego Fuentealba ... Héctor Ponce
IEEE Latin America Transactions | VOL. 19
Diego Fuentealba, et. al.Diego Fuentealba ... Héctor Ponce
01 Aug 2021
IEEE Latin America Transactions | VOL. 19

A Lexicon LDA Model Based Solution to Theme Extraction of Chinese Short Text on the Internet
Xu Wang ... Jing Zhou
-
Xu Wang, et. al.Xu Wang ... Jing Zhou
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic

Abstract

Talk to us

Similar Papers