Abstract

In this paper., the problem of high dimension and sparsity of eigenvector in traditional vector space model in the text classification can be overcome well using Word2vec model based on shallow neural network. However., Word2vec model cannot distinguish the importance of words in the text. For short text., the importance of words will affect the accuracy of classification greatly. To solve this problem., we further use Term Frequency-Inverse Corpus Frequency (TF-ICF) algorithm to calculate the weight of the word in corpus., and get the weighted Word2vec word vector by multiplying the calculated weight and corresponding word vector. We can get the weighted text vectorization representation by adding weighted word vectors. Then we combine the weighted Word2vec text vector and the TFICF text vector to represent the text., and apply it to the short text classification. Experiments on the micro-blog short text dataset demonstrate the effectiveness of the new method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call