Introduction Interest in image retrieval has increased in large part due to the rapid growth of the World Wide Web. According to a recent study (Lawrence & Giles, 1999) there are 180 million images on the publicly indexable Web, a total amount of image data of about 3Tb [terabytes], and an astounding one million or more digital images are being produced every day (Jain, 93). The need to find a desired image from a collection is shared by many groups, including journalists, engineers, historians, designers, teachers, artists, and advertising agencies. Image needs and uses across users in these groups vary considerably. Users may require access to images based on primitive features such as color, texture or shape or users may require access to images based on abstract concepts and symbolic imagery. The technology to access these images has also accelerated phenomenally and at present surpasses our understanding of how users interact with visual information. This paper provides an overview of current research in image information retrieval and provides an outline of areas for future research. The approach is broad and interdisciplinary and focuses on three aspects of image research (IR): text-based retrieval, content-based retrieval, and user interactions with image information retrieval systems. The review concludes with a call for image retrieval evaluation studies similar to TREC. Text-Based Image Retrieval Research Most existing IR systems are text-based, but images frequently have little or no accompanying textual information. The solution historically has been to develop text-based ontologies and classification schemes for image description. Text-based indexing has many strengths including the ability to represent both general and specific instantiations of an object at varying levels of complexity. Reviews of the literature pertaining primarily to text-based approaches include Rasmussen (1997) Lancaster (1998) Lunin (1987) and Cawkell (1993). Long before images could be digitized, access to image collections was provided by librarians, curators, and archivists through text descriptors or classification codes. These indexing schemes were often developed in-house and reflect the unique characteristics of a particular collection or clientele. This is still common practice and recently Zheng (1999) and Goodrum & Martin (1997) have reported on the hybridization of multiple schemas for classifying collections of historic costume collections. Hourihane (1989) has also reviewed a number of these unique systems for image classification. To date, very little research has been conducted on the relative effectiveness of these various approaches to image indexing in electronic environments. Attempts to provide general systems for image indexing include the Getty's Art and Architecture Thesaurus (AAT), which consists of over 120,000 terms for the description of art, art history, architecture, and other cultural objects, and the Library of Congress Thesaurus of Graphic Materials (LCTGM). The AAT currently provides access to thirty-three hierarchical categories of image description using seven broad facets (Associated Concepts, Physical Attributes, Styles and Periods, Agents, Activities, Materials, and Objects). The approach in many collections, particularly general library environments, has been to apply an existing cataloging system like the Dewey Decimal System to image description using the LCTGM, or ICONCLASS. Assignment of terms to describe images is not solved entirely by the use of controlled vocabularies or classification schemes however. The textual representation of images is problematic because images convey information relating to what is actually depicted in the image as well as what the image is about. Shatford (1986) posits this discussion within a framework based on Panofsky's approach to analyzing iconographical levels of meaning in images (1955). …
Read full abstract