DMFN: A disentangled multi-level fusion network for review helpfulness prediction

Gang Ren,Lei Diao,Jiyun Kim

doi:10.1016/j.eswa.2023.120344

Abstract

Predicting the online review helpfulness is of great significance to consumers due to it affects consumers’ purchase decisions to a large extent. To generate accurate helpfulness predictions for online reviews, we need to consider all multi-modal messages including texts and images. Although extensive works have acquired great advances in the Review Helpfulness Prediction (RHP) task, there remain challenges in achieving a fine-grained fusion of the two heterogeneous modalities. Therefore, we propose a novel RHP method DMFN, which exploits the multi-level information of both texts and images to enhance the fine-detail representation of multimodal data. We further employ a weighted token-wise interaction to preserve finer-grained information in a lightweight way. Architecturally, DMFN consists of three parts, a multi-granularity feature generation part, a weighted token-wise interaction part, and a gated multi-level fully fusion part. The first part is built to fully extract the visual and linguistic features from multimodal reviews. The second part is designed to decouple the multi-granularity content and then sufficiently utilize the pair-wise correlations to obtain a disentangled manifold for multimodal inputs. The last part is constructed to adaptively integrate cross-modal features, and it controls the fusion of different features at multiple scales. Extensive experiment results on our collected RHP datasets from Yelp.com and Amazon.com show that our proposed approach is significantly superior to other existing benchmark RHP methods.

Full Text