On-the-fly learning for visual search of large-scale image and video datasets.

Ken Chatfield,Relja Arandjelović,Omkar Parkhi,Andrew Zisserman

doi:10.1007/s13735-015-0077-0

Ken Chatfield, Relja Arandjelović + Show 2 more

Open Access

https://doi.org/10.1007/s13735-015-0077-0

Copy DOI

Abstract

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove ‘outliers’ in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g. TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

Highlights

One of the dreams of large-scale image search is to be able to retrieve images based on their visual content with the same ease, speed and in particular accuracy, as a Google search of the Web
It can be seen that all the multiple query methods are superior to the ‘single query’ baseline, improving the performance by 41 and 78 % for the Oxford queries and Google queries (GQ)
We explore the use of product quantization (PQ), which has been widely used as a compression method for image features [26,44] and works by splitting the original feature into Q-dimensional sub-blocks, each of which is encoded using a separate vocabulary of cluster centres pre-learned from a training set

Summary

Introduction

One of the dreams of large-scale image search is to be able to retrieve images based on their visual content with the same ease, speed and in particular accuracy, as a Google search of the Web. In this paper we explore a method for bridging the gap by learning from readily available images downloaded from the Web with standard image search engines (such as Google Image search). In this manner the semantic gap is obviated for several types of query (see below), allowing powerful visual models to be constructed on the basis of freeform text queries. Putting the two together allows a user to start with a text query, learn a visual model for the specified category and search an unannotated dataset on its visual content with results retrieved within seconds

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Multimedia Information Retrieval	Publication Date: Mar 22, 2015
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On-the-fly learning for visual search of large-scale image and video datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Multimedia Information Retrieval

Lead the way for us

Similar Papers

Foreground Fisher Vector: Encoding Class-Relevant Foreground to Improve Image Classification.
Yongsheng Pan ... Yong Xia
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 28
Yongsheng Pan, et. al.Yongsheng Pan ... Yong Xia
01 Apr 2019
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 28

Codebook-Free Compact Descriptor for Scalable Visual Search
Yuwei Wu ... Yicheng Huang
IEEE Transactions on Multimedia | VOL. 21
Yuwei Wu, et. al.Yuwei Wu ... Yicheng Huang
01 Feb 2019
IEEE Transactions on Multimedia | VOL. 21

Local visual pattern modelling for image and video classification
Peng Wang
-
Peng WangPeng Wang
21 Apr 2017
21 Apr 2017

Brain programming is immune to adversarial attacks: Towards accurate and robust image classification using symbolic learning
Gerardo Ibarra-Vazquez ... Cesar Puente
Swarm and Evolutionary Computation | VOL. 71
Gerardo Ibarra-Vazquez, et. al.Gerardo Ibarra-Vazquez ... Cesar Puente
01 Jun 2022
Swarm and Evolutionary Computation | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On-the-fly learning for visual search of large-scale image and video datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Multimedia Information Retrieval