Abstract

The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.

Highlights

  • With the widespread use and availability of digital environments, the World Wide Web plays an essential role in disseminating information and news

  • Results for Location Entities: To evaluate performance for location entities, we distinguished between images of indoor and outdoor scenes using the scene probabilities y S extracted according to Sect. 4.3 and the hierarchy provided by the Places365 dataset [47]

  • Even when entities are tampered with locations of similar appearance and low Great Circle Distance (GCD) (Fig. 5b, d), the system can operate on a good level and shows promising results

Read more

Summary

Introduction

With the widespread use and availability of digital environments, the World Wide Web plays an essential role in disseminating information and news. Social media platforms such as Twitter allow users to follow worldwide events and news and become a popular source of information [6,35,39]. These news articles often leverage different modalities, e.g., texts and images, to convey information more effectively (Fig. 1). Every modality conveys its specific information, and the combination of modalities enables the communication of a coherent multimodal message. In this regard, photograph content can range from. 2 L3S Research Center, Leibniz University Hannover, Hannover, Germany decorative (with little or no information about the news event) over depicting rich information enhancements (important or additional details) to even misleading visual information.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call