Image Access, the Semantic Gap, and Social Tagging as a Paradigm Shift

Corinne Jorgensen

doi:10.7152/acro.v18i1.12868

Abstract

The recent phenomenon of “social tagging” or “distributed indexing” raises a number of questions regarding long-held beliefs and practices of the classification and indexing community. This workshop paper covers several of these issues, such as locus of authority, control, and meaning, and suggests we may be observing the emergence of a new paradigm of knowledge organization. The Semantic Gap The “semantic gap” is mentioned frequently in the literature of image access. The term originated in computer science (Smeulders et al., 2000) and is still used in the CS literature today to refer to the difference between two descriptions of an object using different languages, specifically the difference between a human-readable description and a computational representation. In a computational representation, a simple image of an object moves from the level of individual pixels to assemblages of image primitives such as color, shape/region, and texture, to the assemblage and recognition of an object, at least at the level of a simple “basic object.” Object recognition necessitates a level of “understanding” of what is being represented; this is achieved by inferring what different combinations of primitives may represent, e.g. black spots or black stripes on an orangetan background and an assemblage of potential “leg,” “body,” “tail,” and “head” shapes, perhaps combined with “nature colors,” could be interpreted as a leopard or tiger. The process is fraught with stumbling blocks such as occlusion, angle of view, scale, shadow, and lack of uniqueness, to mention a few. However, it is at the level of object recognition that human image description often begins. With the development of automated methods of content-based image retrieval the term “semantic gap” has come to refer to the larger issue of the gap between these image primitives, or low-level features, and the context-sensitive meanings human beings associate with these. This brings us beyond object recognition and understanding into more abstract levels of semantic meaning, and the meanings or emotions associated with even one image can be many, and can vary across time and place. For a human, recognition of familiar objects is instantaneous, and an image of a tiger, once recognized, can represent multiple concepts such power, ferocity, freedom (or a lack thereof, as in a caged tiger), or even endangered species. These concepts form a gestalt of the object, gestalt being a German word roughly translated as a complete pattern or configuration. There are three parts to a definition of gestalt: a thing, its context or environment, and the relationship between them (Wymore 2002). Studies in cognitive science suggest that this gestalt may have equal importance with sensory stimuli in the process of actual recognition of the object.

Full Text