Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction

Mohammadreza Shams,Ahmad Baraani-Dastjerdi

doi:10.1016/j.eswa.2017.02.038

Abstract

Aspect extraction is one of the fundamental steps in analyzing the characteristics of opinions, feelings and emotions expressed in textual data provided for a certain topic. Current aspect extraction techniques are mostly based on topic models; however, employing only topic models causes incoherent aspects to be generated. Therefore, this paper aims to discover more precise aspects by incorporating co-occurrence relations as prior domain knowledge into the Latent Dirichlet Allocation (LDA) topic model. In the proposed method, first, the preliminary aspects are generated based on LDA. Then, in an iterative manner, the prior knowledge is extracted automatically from co-occurrence relations and similar aspects of relevant topics. Finally, the extracted knowledge is incorporated into the LDA model. The iterations improve the quality of the extracted aspects.The competence of the proposed ELDA for the aspect extraction task is evaluated through experiments on two datasets in the English and Persian languages. The experimental results indicate that ELDA not only outperforms the state-of-the-art alternatives in terms of topic coherence and precision, but also has no particular dependency on the written language and can be applied to all languages with reasonable accuracy. Thus, ELDA can impact natural language processing applications, particularly in languages with limited linguistic resources.

Full Text