Multimodal Information Retrieval: Challenges and Future Trends

Faraz Hasan,Mohammad Ubaidullahbokhari

doi:10.5120/12951-9967

Abstract

ABSTRACT Multimodal information retrieval is a research problem of great interest in all domains, due to the huge collections of multimedia data available in different contexts like text, image, audio and video. Researchers are trying to incorporate multimodal information retrieval using machine learning, support vector machines, neural network and neuroscience etc. to provide an efficient retrieval system that fulfills user need. This paper is an overview of multimodal information retrieval, challenges in the progress of multimodal information retrieval. General Terms Multi Modal Information Retrieval, Information Retrieval. Keywords Multi Modal Information Retrieval, Information Retrieval, Machine Learning, SVM, Semantic Gap, Query Reformulation, Fusion Techniques. 1. INTRODUCTION The growth of digital content on web has reached impressive rates in the last decade, the convergence of web, mobile access and digital television has boosted the production of images, audio and video contents, making the web a truly multimedia platform. Nowadays, multimedia popularity demands intelligent and efficient maneuverings in order to manage with the large amount of multimedia data. Due to the fact that, every modality has its own retrieval models so, for better understanding of methodology this paper describes the work done in each modality one by one. In the field of textual information retrieval there are broadly two modes of retrieval based on keyword (ad-hoc retrieval), categories (ontology) [1]. All these modes of retrievals have their pros and cons, therefore used in different types of applications. There is a need of a generic system that can work on various techniques and recognizes the application intelligently. There are broadly four traditional information retrieval applications: content searching, text classification/clustering, content management and question answering, most of these use statistical or machine learning approaches [2]. For improving the retrieval methods in faster mode, indexing techniques are used. Researchers also used query reformulation techniques for retrieving appropriate information required by the user. For better understanding of user‟s need, Lavrenko and Croft [3] introduced the concept of relevance feedback for obtaining the relevance of retrieved information with the user‟s need in the form of feedback interface or getting relevance on opened documents. In the area of image retrieval, there are broadly four modes of retrieval viz retrieval through descriptors, texture or pattern recognition, feature based retrieval and retrieval through objects [4]. Early systems mainly used image descriptors based on color or with texture and shape [5]. Local feature extraction from image patches or segmented image regions based on feature matching is discussed in [6]. Recently, techniques used by the researchers are based on inverted files, Bag-Of-Visual words [7], Fisher Vectors [8] and Scale Invariant Feature Transform (SIFT) [9]. There are broadly two types of features discussed in past works i.e. local features and global features [10]. In the field of audio retrieval, most of the early systems use audio retrieval by metadata (artist, song title, album title etc.) [11], content based retrieval either via converting audio signals into text words or measuring similarity by rhythm and tempo. Researchers have also used annotation-based approaches for audio retrieval. Multimodal information retrieval (MMIR) is about the search for information in any modality (text, image, audio and video) on web, using combinations of two or more retrieval models. A novel unified framework for multimodal search and retrieval by introducing a novel data representation for multimodal data in Content Object (CO) is described in [12]. Recent efforts in the field of multimodal retrieval systems have led to a growing research community and a number of academic and industrial projects. Besides focusing on singlemode of retrieval systems, latest technologies target on multimodal retrieval engines. Current development explicitly focused on queries such as “Show me the video and related documents related to the input query” or “Give me all media (text, image, audio and video) that contains information about the mouth cancer” come into vogue. In order to support such challenging requests, researcher needs to work on several fronts; some of these are listed below: • New semantic models for combining individual media models.• New retrieval engines for crossing the media boundary during search. • New interfaces managing the input and presentation of various media data, etc. • New retrieval models for retrieving relative information within the same media as well as in cross media.

Full Text