Abstract
In order to make better use of massive network comment data for decision-making support of customers and merchants in the big data era, this paper proposes two unsupervised optimized LDA (Latent Dirichlet Allocation) models, namely, SLDA (SentiWordNet WordNet-Latent Dirichlet Allocation) and HME-LDA (Hierarchical Clustering MaxEnt-Latent Dirichlet Allocation), for aspect-based opinion mining. One scheme of each of two optimized models, which both use seed words as topic words and construct the inverted index, is designed to enhance the readability of experiment results. Meanwhile, based on the LDA topic model, we introduce new indicator variables to refine the classification of topics and try to classify the opinion target words and the sentiment opinion words by two different schemes. For better classification effect, the similarity between words and seed words is calculated in two ways to offset the fixed parameters in the standard LDA. In addition, based on the SemEval2016ABSA data set and the Yelp data set, we design comparative experiments with training sets of different sizes and different seed words, which prove that the SLDA and the HME-LDA have better performance on the accuracy, recall value, and harmonic value with unannotated training sets.
Highlights
With the development of the Internet, almost all the things of human living have become digitized
In view of the short content, wide coverage and the small number of the annotated corpus of the network comment and its need for aspect-based mining, this paper proposes two schemes based on the latent Dirichlet allocation (LDA) topic model that have unsupervised features and good extensibility, making it possible for network comments to perform aspect-based opinion mining with as little annotated data as possible
The first scheme is based on the inverted list and the SLDA (SentiWordNet WordNet-Latent Dirichlet Allocation) model proposed in this paper
Summary
With the development of the Internet, almost all the things of human living have become digitized. The effects of these models will be greatly reduced when the aspect category of the comment is transferred from the food and beverage to the laptop Supervised models such as BMAM [11] need a lot more manpower than the models proposed in this paper to annotate data due to the small number of annotated training sets given. In view of the short content, wide coverage and the small number of the annotated corpus of the network comment and its need for aspect-based mining, this paper proposes two schemes based on the LDA topic model that have unsupervised features and good extensibility, making it possible for network comments to perform aspect-based opinion mining with as little annotated data as possible.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have