The Text modeling method of Tibetan text combining Word2vec and improved TF-IDF

Jiang Tao,Ma Cao Wan,Jia Hao Meng,Li Jia

doi:10.1088/1742-6596/1601/4/042007

Jiang Tao, Ma Cao Wan + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1601/4/042007

Copy DOI

Abstract

In view of the problem of ignoring the importance of words in Tibetan text representation, this paper proposes a Tibetan text representation method combining Word2ve and improved TF-IDF. First of all, the method uses the Word2vec model to train all the word vectors of the text, which can capture the semantic information of the text. Secondly, the improved TF-IDF algorithm is used to calculate the weight of each word and word vector in the text. Fusion of Word2vec and improved TF-IDF algorithm to construct a Tibetan text vector representation model based on word vectors and weights. Finally, it uses the BiLSTM neural network model classifier to effectively classify the Tibetan text. The experimental results show that this method is better than the traditional method in the classification of Tibetan text, which verifies the effectiveness of the method.

Full Text