Abstract

Predicting the online review helpfulness is of great significance to consumers due to it affects consumers’ purchase decisions to a large extent. To generate accurate helpfulness predictions for online reviews, we need to consider all multi-modal messages including texts and images. Although extensive works have acquired great advances in the Review Helpfulness Prediction (RHP) task, there remain challenges in achieving a fine-grained fusion of the two heterogeneous modalities. Therefore, we propose a novel RHP method DMFN, which exploits the multi-level information of both texts and images to enhance the fine-detail representation of multimodal data. We further employ a weighted token-wise interaction to preserve finer-grained information in a lightweight way. Architecturally, DMFN consists of three parts, a multi-granularity feature generation part, a weighted token-wise interaction part, and a gated multi-level fully fusion part. The first part is built to fully extract the visual and linguistic features from multimodal reviews. The second part is designed to decouple the multi-granularity content and then sufficiently utilize the pair-wise correlations to obtain a disentangled manifold for multimodal inputs. The last part is constructed to adaptively integrate cross-modal features, and it controls the fusion of different features at multiple scales. Extensive experiment results on our collected RHP datasets from Yelp.com and Amazon.com show that our proposed approach is significantly superior to other existing benchmark RHP methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call