Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews

Wei Wang,Guanyin Tan,Hongwei Wang

doi:10.1007/s13042-016-0596-x

Abstract

Extracting aspects and opinions is the basis of sentiment analysis in fine-grained manner. It is often conducted in one of the following two ways: rule-based approaches and machine learning approaches. However, no conclusion has been drawn yet on the matter of multi-domains applicability in Chinese, so robustness and reliability across different fields are being of concern to these algorithms. We compare ten approaches of aspect-opinion extraction on Chinese corpora from seven domains. The compared methods include TF-based model plus POS, CRFs-based opinion mining, SVM-based opinion mining, MNB-based opinion mining, HMM-based opinion mining, RFM-based opinion mining, RNN-based opinion mining, KNN-based opinion mining, CART-based opinion mining and LPM-based opinion mining. We collect 3146 Chinese reviews as corpora including digital camera, cosmetics, book, hotel, movie, cellphone and restaurant. Experiments reveal the following results: (1) no algorithm dominates over all domains, (2) machine learning algorithms outperform rule-based approaches, (3) the length of text affects the accuracy of opinion mining negatively for rule-based approaches, while some machine learning methods are good at extracting long reviews, (4) for HMM-based model, RFM-based model, RNN-based model, KNN-based model, CART-based model and LPM-based model, the performances are similar in terms of precision and recall, (5) overall, SVM-based approach performs best among almost all the domains for opinion mining.

Full Text