Massive-scale learning of image and video semantic concepts

J R Smith,L Cao,E Pring,M Merler,N C F Codella,R A Uceda-Sosa,Q.-B Nguyen,M L Hill

doi:10.1147/jrd.2015.2398590

Abstract

Rapid growth in the capture and generation of images and videos is driving the need for more efficient and effective systems for analyzing, searching, and retrieving this data. Specific challenges include supporting automatic content indexing at a large scale and accurately extracting a sufficiently large number of relevant semantic concepts to enable effective search. In this paper, we describe the development of a system for massive-scale visual semantic concept extraction and learning for images and video. The system models the visual semantic space using a hierarchical faceted classification scheme across <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">objects , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">scenes , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">people , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">activities , and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">events and utilizes a novel machine learning approach that creates ensemble classifiers from automatically extracted visual features. The ensemble learning and extraction processes are easily parallelizable for distributed processing using Hadoop® and IBM InfoSphere® Streams, which enable efficient processing of large data sets. We report on various applications and quantitative and qualitative results for different image and video data sets.

Full Text