Abstract

The Latent Dirichlet Allocation (LDA) model, which is a document-level probabilistic model, has been widely used in topic modeling. However, an essential issue of the LDA is its shortage in identifying co-occurrence relationships (e.g., aspect-aspect, aspect-opinion, etc.) in sentences. To address the problem, we propose an association constrained LDA (AC-LDA) for effectively capturing the co-occurrence relationships. Specifically, based on the basic features of the syntactic structure in product reviews, we formalize three major types of word association combinations and then carefully design corresponding identifications. For reducing the influence of global aspect words on the local distribution, we apply an important constraint on global aspects. Finally, the constraint and related association combinations are merged into the LDA to guide the topic-words allocation in the learning process. Based on the experiments on real-world product review data, we demonstrate that our model can effectively capture the relationships hidden in local sentences and further increase the extraction rate of fine-grained aspects and opinion words. Our results confirm the superiority of the AC-LDA over the state-of-the-art methods in terms of the extraction accuracy. We also verify the strength of our method in identifying irregularly appeared terms, such as non-aspect opinions, low-frequency words, and secondary aspects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call