Abstract

Due to the remarkable increase in e-commerce transactions, people try to have an appropriate choice of purchase through considering other people's reflected experience in product's or service's reviews. Automatic analysis of such corpus requires enhanced developed algorithms based on natural language processing and opinion mining. Moreover, the linguistic differences make extending existing algorithms from one language to another challenging and in some cases impossible. Opinion mining focuses on different subjects of review analysis such as spam detection, aspect elicitation and polarity allocation. In this article, we focus on detection of explicit aspect and propose a methodology to overcome some difficult and problematic aspect compounds in the form of multi- words format in Persian language. Our approach proposes the construction of a directed weighted graph (ADG structure) based on some yielded information from FP-Growth frequent pattern identification algorithm on our corpus of Persian sentence. Traversing some special paths within the ADG graph according to our developed rules could lead us to the extraction of problematic multi-word aspects. We utilize Neo4j NoSQL graph database environment and its Cypher query language in order to create the ADG graph and access the desired paths that reflects our developed rules on the ADG structure which lead us to extract the multi-word aspects. The evaluation of our methodology with the existing approaches on the issue of aspect derivation in Persian language including ELDA, SAM, an MMI-based and an LRT-based algorithms indicates the robustness of our approach.

Highlights

  • Opinion mining as a sub domain of data mining is tightly related to natural language processing and has a numerous applications in various domains including customer relationship management and marketing

  • PROBLEM STATEMENT Due to the fact that the task of opinion mining as a whole and one of its subdomain as aspect extraction could not be extended from one language to another due to the linguistic differences among languages, this paper focuses on explicit multi-word aspect extraction in Persian language

  • Evaluation For evaluation of our methodology in the first place we employ fundamental algorithms of frequency-based, POS based and Latent Dirichlet Allocation (LDA) on our dataset to extract single-word aspects

Read more

Summary

Introduction

Opinion mining as a sub domain of data mining is tightly related to natural language processing and has a numerous applications in various domains including customer relationship management and marketing. Tourism related activities could take advantage of this trend and attract their customers via utilizing social medias and web sites in which customers and guests could freely express their experience on different hotels' or other tourism' services. The problem is that due to the great differences in linguistic structures, an accurate and trustworthy text mining and NLP based approach on a language could not be extended to the others. One challenge in this domain is that users express positive or negative sentiments on different features of a product or service but this doesn't mean that they have positive or negative opinions on that product or service as a whole [1], [2]. Some example of aspects on a hotel entity could be hotel's location or architecture and some aspects on cell phone entity could

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call