OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

Sheng Liu,Kevin Lin,Junsong Yuan,Zicheng Liu,Lijuan Wang

doi:10.1609/aaai.v36i2.20070

Abstract

We introduce the task of open-vocabulary visual instance search (OVIS). Given an arbitrary textual search query, Open-vocabulary Visual Instance Search (OVIS) aims to return a ranked list of visual instances, i.e., image patches, that satisfies the search intent from an image database. The term ``open vocabulary'' means that there are neither restrictions to the visual instance to be searched nor restrictions to the word that can be used to compose the textual search query. We propose to address such a search challenge via visual-semantic aligned representation learning (ViSA). ViSA leverages massive image-caption pairs as weak image-level (not instance-level) supervision to learn a rich cross-modal semantic space where the representations of visual instances (not images) and those of textual queries are aligned, thus allowing us to measure the similarities between any visual instance and an arbitrary textual query. To evaluate the performance of ViSA, we build two datasets named OVIS40 and OVIS1600 and also introduce a pipeline for error analysis. Through extensive experiments on the two datasets, we demonstrate ViSA's ability to search for visual instances in images not available during training given a wide range of textual queries including those composed of uncommon words. Experimental results show that ViSA achieves an mAP@50 of 27.8% on OVIS40 and achieves a recall@30 of 21.3% on OVIS1400 dataset under the most challenging settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 1

Similar Papers

Author response: Lying in a 3T MRI scanner induces neglect-like spatial attention bias
Axel Lindner ... Hans-Otto Karnath
-
Axel Lindner, et. al.Axel Lindner ... Hans-Otto Karnath
02 Sep 2021
02 Sep 2021

Visual search and visual lobe size: can training on one affect the other?
A.K Gramopadhye ... R Sreenivasan
International Journal of Industrial Ergonomics | VOL. 30
A.K Gramopadhye, et. al.A.K Gramopadhye ... R Sreenivasan
20 May 2002
International Journal of Industrial Ergonomics | VOL. 30

I can look for it! Modulation of a concurrent Visual Working Memory task in Visual Search in development.
María Quirós-Godoy ... Elena Perez-Hernandez
Frontiers in psychology | VOL. 13
María Quirós-Godoy, et. al.María Quirós-Godoy ... Elena Perez-Hernandez
22 Jul 2022
Frontiers in psychology | VOL. 13

Visual instance mining from the graph perspective
Wei Li ... Lei Zhang
Multimedia Systems | VOL. 24
Wei Li, et. al.Wei Li ... Lei Zhang
04 Feb 2017
Multimedia Systems | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence