A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary

Xinhua Zhu,Hongchao Chen,Yishan Chen,Tianjun Wu,Qingting Xu

doi:10.1109/access.2019.2954106

Abstract

Automatic text classification is a research focus and core technology in information retrieval and natural language processing. Different from the traditional text classification methods (SVM, Bayesian, KNN), the class-center vector method is an important text classification method, which has the advantages of less calculation and high efficiency. However, the traditional class-center vector method for text classification has the disadvantages that the class vector is large and sparse, and its classification accuracy is not high because of the lack of semantic information. To overcome these problems, this paper proposes a novel class-center vector model for text classification using dependencies and a semantic dictionary. We respectively use WordNet English semantic dictionary and Tongyici Cilin Chinese semantic dictionary to cluster the English or Chinese feature words in the class-center vector and to significantly reduce the dimension of class-center vector, thereby realizing a new class-center vector for text classification using dependencies and a semantic dictionary. Experiments show that, compared with traditional text classification algorithms, the improved class-center vector method has lower time complexity and higher accuracy on the 20Newsgroups English corpus, Fudan and Sogou Chinese corpus. This paper is an improved version of our NLPCC2019 conference paper.

Highlights

With the rapid development and increasing popularity of Internet technology, electronic text information is expanding rapidly
TFIDF algorithm, we introduce dependencies, synonyms in the semantic dictionary and the part-of-speech to understand and optimize the text feature, and put forward an improved weight calculation method based on TFIDF. (2) We respectively use the category nodes located in the 6-9 layers of WordNet and the category code with ‘‘#’’ in Tongyici Cilin Extension Version to cluster the English or Chinese feature words in the class-center vector and to significantly reduce the dimension of classcenter vector, thereby realizing a new class-center vector
After classifying the text features in the corpus according to dependencies, this paper proposes the following TFIDF weight calculation method based on dependencies and the synonyms in the semantic dictionary

Summary

INTRODUCTION

With the rapid development and increasing popularity of Internet technology, electronic text information is expanding rapidly. TFIDF algorithm, we introduce dependencies, synonyms in the semantic dictionary and the part-of-speech to understand and optimize the text feature, and put forward an improved weight calculation method based on TFIDF. After classifying the text features in the corpus according to dependencies, this paper proposes the following TFIDF weight calculation method based on dependencies and the synonyms in the semantic dictionary. According to the result of dependency syntactic analysis implemented by Stanford Parser, we get the sentence component of jth (1 ≤ j ≤ m) occurrence of the feature word ti in the text, and classify the sentence component as the ki,j level according to TABLE 2 and assigns it a weight wi,j, which is calculated as follows: wi,j = 2 cos ki,j λ π. Where s denotes the total number of words in the text where feature ti is located and D denotes the total number of texts in the corpus, pi denotes the number of the texts containing the feature word ti

TFIDF WEIGHT IMPROVEMENT BASED ON PART-OF-SPEECH

CLASS-CENTER VECTOR CLUSTERING APPROACH BASED ON A SEMANTIC DICTIONARY

A NEW VECTOR SIMILARITY METHOD FOR CLUSTERED CLASS-CENTER VECTORS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Dec 5, 2019
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Semantic text classification: A survey of past and recent advances
Berna Altınel ... Murat Can Ganiz
Information Processing & Management | VOL. 54
Berna Altınel, et. al.Berna Altınel ... Murat Can Ganiz
20 Aug 2018
Information Processing & Management | VOL. 54

An Improved Class-Center Method for Text Classification Using Dependencies and WordNet
Xinhua Zhu ... Yishan Chen
-
Xinhua Zhu, et. al.Xinhua Zhu ... Yishan Chen
01 Jan 2019
01 Jan 2019

Research on the application of graph neural networks in text classification
Chuangyu Xu
Applied and Computational Engineering | VOL. 52
Chuangyu XuChuangyu Xu
27 Mar 2024
Applied and Computational Engineering | VOL. 52

Review of Text Classification Methods on Deep Learning
Hongping Wu ... Yuling Liu
Computers, Materials & Continua | VOL. 63
Hongping Wu, et. al.Hongping Wu ... Yuling Liu
01 Jan 2020
Computers, Materials & Continua | VOL. 63

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access