Abstract

News representation is critical for news recommendation. Most existing methods learn news representations only from news texts while ignoring the visual information of news. In fact, users may click news not only due to the interest in news titles but also the attraction of news images. Thus, images are useful for representing news and predicting news clicks. Pretrained visiolinguistic models are powerful in multi-modal understanding, which can represent news from both textual and visual contents. In this paper, we propose a multimodal news recommendation method that can incorporate both textual and visual information of news to learn multimodal news representations. We first extract region-of-interests (ROIs) from news images via object detection. We then use a pre-trained visiolinguistic model to encode both news texts and image ROIs and model their inherent relatedness using co-attentional Transformers. In addition, we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for the accurate modeling of user interest in candidate news. Experiments validate that incorporating multimodal news information can effectively improve the performance of news recommendation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.