Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM

Yuxin Liu,Li Wang,Tengfei Shi,Jinyan Li

doi:10.1016/j.is.2021.101865

Abstract

Spam reviews misguide decision makings of consumers and may seriously affect fair trading in the online markets. Existing methods for detecting spam reviews mainly focus on feature designs from linguistic and psychological clues, but they hardly reveal the potential semantics. Recent research works apply deep learning to capture semantics features, while these models fail to extract multi-granularity information of the text structures nor consider the mutual influence among the sentences. We propose a hierarchical attention network in which distinct attentions are purposely used at the two layers to capture important, comprehensive, and multi-granularity semantic information. At the first layer, we especially use an N-gram CNN to extract the multi-granularity semantics of the sentences. We then use a combination of convolution structure and Bi-LSTM to extract important and comprehensive semantics in a document at the second layer. Extensive experiments on public datasets demonstrate that our model has superior detection performance over the state-of-the-art baselines, improving F1 score in the mixed-domain to 89.3% (with 4.8 points absolute improvement), F1 score in the Doctor domain to 92.8% (with 9.9 points absolute improvement), F1 score in the Hotel domain to 86.1% (with 2.4 points absolute improvement) and F1 score in the cross-domain to 84.7% (with 10.4 points absolute improvement).

Full Text