Research on multimedia information retrieval (MIR) has recently witnessed a booming interest. A prominent feature of this research trend is its simultaneous but independent materialization within several fields of computer science. The resulting richness of paradigms, methods and systems may, on the long run, result in a fragmentation of efforts and slow down progress. The primary goal of this study is to promote an integration of methods and techniques for MIR by contributing a conceptual model that encompasses in a unified and coherent perspective the many efforts that are being produced under the label of MIR. The model offers a retrieval capability that spans two media, text and images, but also several dimensions: form, content and structure. In this way, it reconciles similarity-based methods with semantics-based ones, providing the guidelines for the design of systems that are able to provide a generalized multimedia retrieval service, in which the existing forms of retrieval not only coexist, but can be combined in any desired manner. The model is formulated in terms of a fuzzy description logic, which plays a twofold role: (1) it directly models semantics-based retrieval, and (2) it offers an ideal framework for the integration of the multimedia and multidimensional aspects of retrieval mentioned above. The model also accounts for relevance feedback in both text and image retrieval, integrating known techniques for taking into account user judgments. The implementation of the model is addressed by presenting a decomposition technique that reduces query evaluation to the processing of simpler requests, each of which can be solved by means of widely known methods for text and image retrieval, and semantic processing. A prototype for multidimensional image retrieval is presented that shows this decomposition technique at work in a significant case.
Read full abstract