Abstract

Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are viewed as a probabilistic mixture of latent topics; a topic has a probability distribution over words and each document is modelled on the basis of a bag-of-words model. The topic models such as LDA are sufficient in learning hidden topics but they do not take into account the deeper semantic knowledge of a document. In this article, we propose a novel method based on topic modelling to determine the latent aspects of online review documents. In the proposed model, which is called Concept-LDA, the feature space of reviews is enriched with the concepts and named entities, which are extracted from Babelfy to obtain topics that contain not only co-occurred words but also semantically related words. The performance in terms of topic coherence and topic quality is reported over 10 publicly available datasets, and it is demonstrated that Concept-LDA achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure. The learned topic representation by Concept-LDA leads to accurate and an easy aspect extraction task in an aspect-based sentiment analysis system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.