StacMR: Scene-Text Aware Cross-Modal Retrieval

Andres Mafla,Lluis Gomez,Dimosthenis Karatzas,Rafael S Rezende,Diane Larlus

doi:10.1109/wacv48630.2021.00227

Abstract

Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in images, which may contain crucial information for retrieval. In this paper, we first propose a new dataset that allows exploration of cross-modal retrieval where images contain scene-text instances. Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space. Extensive experiments confirm that cross-modal retrieval approaches benefit from scene text and highlight interesting research questions worth exploring further. Dataset and code are available at europe.naverlabs.com/stacmr.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

StacMR: Scene-Text Aware Cross-Modal Retrieval

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Cross-Modal Image Retrieval Considering Semantic Relationships With Many-to-Many Correspondence Loss
Huaying Zhang ... Rintaro Yanagi
IEEE Access | VOL. 11
Huaying Zhang, et. al.Huaying Zhang ... Rintaro Yanagi
01 Jan 2023
IEEE Access | VOL. 11

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
Mengjun Cheng ... Jingtuo Liu
-
Mengjun Cheng, et. al.Mengjun Cheng ... Jingtuo Liu
01 Jun 2022
01 Jun 2022

Deep-Learning-based Cross-Modal Luxury Microblogs Retrieval
Menghao Ma ... Wenhe Feng
-
Menghao Ma, et. al.Menghao Ma ... Wenhe Feng
11 Dec 2021
11 Dec 2021

Cross-Modal Audio-Text Retrieval via Sequential Feature Augmentation
Fuhu Song ... Jifeng Hu
-
Fuhu Song, et. al.Fuhu Song ... Jifeng Hu
17 Mar 2023
17 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

StacMR: Scene-Text Aware Cross-Modal Retrieval

Abstract

Talk to us

Similar Papers