Research of Text Classification Based on Improved TF-IDF Algorithm

Cai-Zhi Liu,Yan-Xiu Sheng,Yong-Quan Yang,Zhi-Qiang Wei

doi:10.1109/irce.2018.8492945

Abstract

In recent years, with the rapid development of Internet Technology, text data is growing rapidly every day. Users need to filter out the information they need from a large amount of text. Therefore, automatic text classification technology can help users find information. In order to address problems, such as ignoring contextual semantic links and different vocabulary importance in traditional text classification techniques, this paper presents a vector representation of feature words based on the deep learning tool Word2vec, and the weight of the feature words is calculated by the improved TF-IDF algorithm. By multiplying the weight of the word and the word vector, the vector representation of the word is realized. Finally, each text is represented by accumulating all the word vectors. Thus, text classification is carried out.

Full Text