Abstract

Rapid growth in the capture and generation of images and videos is driving the need for more efficient and effective systems for analyzing, searching, and retrieving this data. Specific challenges include supporting automatic content indexing at a large scale and accurately extracting a sufficiently large number of relevant semantic concepts to enable effective search. In this paper, we describe the development of a system for massive-scale visual semantic concept extraction and learning for images and video. The system models the visual semantic space using a hierarchical faceted classification scheme across <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">objects</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">scenes</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">people</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">activities</i> , and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">events</i> and utilizes a novel machine learning approach that creates ensemble classifiers from automatically extracted visual features. The ensemble learning and extraction processes are easily parallelizable for distributed processing using Hadoop® and IBM InfoSphere® Streams, which enable efficient processing of large data sets. We report on various applications and quantitative and qualitative results for different image and video data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call