Short text classification based on strong feature thesaurus

Bing-Kun Wang,Xing Li,Wan-Xia Yang,Yong-Feng Huang

doi:10.1631/jzus.c1100373

Abstract

Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low accuracy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Naive Bayes Multinomial.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Short text classification based on strong feature thesaurus

Abstract

Talk to us

Similar Papers

More From: Journal of Zhejiang University SCIENCE C

Lead the way for us

Journal: Journal of Zhejiang University SCIENCE C	Publication Date: Sep 1, 2012
Citations: 34

Similar Papers

Leveraging Knowledge-Based Features With Multilevel Attention Mechanisms for Short Arabic Text Classification
Iyad Alagha
IEEE Access | VOL. 10
Iyad AlaghaIyad Alagha
01 Jan 2021
IEEE Access | VOL. 10

Systematic framework for short text classification based on improved TWE and supervised MCFS topic merging strategy
Baoshan Sun ... Chunqing Li
International Journal of Computers and Applications | VOL. 44
Baoshan Sun, et. al.Baoshan Sun ... Chunqing Li
06 May 2020
International Journal of Computers and Applications | VOL. 44

Short Text Classification Based on Hierarchical Heterogeneous Graph and LDA Fusion
Xinlan Xu ... Bing Luo
Electronics | VOL. 12
Xinlan Xu, et. al.Xinlan Xu ... Bing Luo
06 Jun 2023
Electronics | VOL. 12

Short Text Classification Based on Cross-Connected GRU Kernel Mapping Support Vector Machine
Qi Wang ... Zhaoying Liu
-
Qi Wang, et. al.Qi Wang ... Zhaoying Liu
01 Nov 2021
01 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Short text classification based on strong feature thesaurus

Abstract

Talk to us

Similar Papers

More From: Journal of Zhejiang University SCIENCE C