Abstract

The complexity of Chinese language system brings great challenge to sentiment analysis. Traditional artificial feature selection is easy to cause the problem of inaccurate segmentation semantics. High quality preprocessing results are of great significance to the subsequent network model learning. In order to effectively extract key features of sentences, retain feature words while removing irrelevant noise and reducing vector dimensions, an algorithm module based on sentiment lexicon combined with Word2vec incremental training is proposed in terms of feature engineering. Firstly, the data set is cleaned, and the sentence is segmented by loading a custom sentiment lexicon with Jieba. Secondly, the results after stopping words are obtained through Skip-gram training algorithm to obtain the word vector model. Secondly, the model is added to a large corpus for incremental training to obtain a more accurate word vector model. Finally, the features are learned and classified by inputting the embedding layer into the neural network model. Through the comparison experiment of multiple models, it is found that the combined model (CNN-BiLSTM-Attention) has better classification effect and better application ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call