Finding pictures in context

Rohini K Srihari,Zhongfei Zhang

doi:10.1007/bfb0016492

Abstract

This research explores the interaction of textual and photographic information in multimodal documents. The WWW may be viewed as the ultimate, large-scale, dynamically changing, multimedia database. Finding useful information from the WWW without encountering numerous false positives (the current case) poses a challenge to multimedia information retrieval systems (MMIR). We exploit the fact that images do not appear in isolation, but rather with accompanying, collateral text. Taken independently, existing techniques for picture retrieval using (i) collateral text-based and (ii) image-based methods have several limitations. Text-based methods, while very powerful in matching context, do not have access to image content. Image-based methods compute general similarity between images and provide limited semantics. Our research focuses on improving precision and recall in a MMIR system by interactively combining text processing with image processing (IP) in both the indexing and retrieval phases. A picture search engine is demonstrated as an application.

Full Text