Combining Word and Character N-Grams for Detecting Deceptive Opinions

Al Hafiz Akbar Maulana Siagian,Masayoshi Aritsugi

doi:10.1109/compsac.2017.90

Abstract

Essentially, opinion reviews are a valuable and trustworthy source of information for the readers. However, regarding the business purposes, a huge number of deceptive opinions are intentionally posted on the Web. In order to keep opinion reviews as a precious and trusted resource, we propose a method which focuses on detecting positive and negative deceptive opinions. In this paper, we explore the feasibility of combining word and character n-grams as a feature for detecting positive and negative deceptive opinions. The majority of studies in this task show that employing n-grams, i.e., words or characters, as a feature is sufficient to obtain good results. We examine our proposed method using a corpus about hotel reviews containing positive and negative opinions both deceptive and truthful. We extract each of word and character n-grams from reviews in the dataset, and then combine them as a feature. Our experiment results show that our proposed method outperforms methods of using the word or character n-grams alone. Furthermore, we consider applying the Principal Component Analysis (PCA) to classify the dominant and nonessential feature attributes for decreasing the size of feature attributes. The obtained results by removing the irrelevant feature attributes are similar to those of using all feature attributes.

Full Text