Image-text coherence and its implications for multimodal AI.

Malihe Alikhani,Baber Khalid,Matthew Stone

doi:10.3389/frai.2023.1048874

Abstract

Human communication often combines imagery and text into integrated presentations, especially online. In this paper, we show how image-text coherence relations can be used to model the pragmatics of image-text presentations in AI systems. In contrast to alternative frameworks that characterize image-text presentations in terms of the priority, relevance, or overlap of information across modalities, coherence theory postulates that each unit of a discourse stands in specific pragmatic relations to other parts of the discourse, with each relation involving its own information goals and inferential connections. Text accompanying an image may, for example, characterize what's visible in the image, explain how the image was obtained, offer the author's appraisal of or reaction to the depicted situation, and so forth. The advantage of coherence theory is that it provides a simple, robust, and effective abstraction of communicative goals for practical applications. To argue this, we review case studies describing coherence in image-text data sets, predicting coherence from few-shot annotations, and coherence models of image-text tasks such as caption generation and caption evaluation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Artificial Intelligence	Publication Date: May 15, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Image-text coherence and its implications for multimodal AI.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence

Lead the way for us

Similar Papers

Theories of truth

-

27 Jul 1978
27 Jul 1978

From worker empowerment to managerial control: The devolution of AI tools' intended positive implementation to their negative consequences
Emmanuel Monod ... Jiayin Qi
Information and Organization | VOL. 34
Emmanuel Monod, et. al.Emmanuel Monod ... Jiayin Qi
14 Dec 2023
Information and Organization | VOL. 34

Human heuristics for AI-generated language are flawed
Maurice Jakesch ... Mor Naaman
Proceedings of the National Academy of Sciences | VOL. 120
Maurice Jakesch, et. al.Maurice Jakesch ... Mor Naaman
07 Mar 2023
Proceedings of the National Academy of Sciences | VOL. 120

Engineering Kindness
Cindy Mason
International Journal of Synthetic Emotions | VOL. 6
Cindy MasonCindy Mason
01 Jan 2015
International Journal of Synthetic Emotions | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image-text coherence and its implications for multimodal AI.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence