Overview of Uni-modal and Multi-modal Representations for Classification Tasks

Aryeh Wiesen,Yaakov Hacohen-Kerner

doi:10.1007/978-3-319-91947-8_41

Abstract

Classification is one of the most fundamental tasks in data mining and machine learning. It is being applied in an increasing number of fields, e.g. filtering, identification, information retrieval, information extraction, and similarity detection. A basic and necessary condition for the success of a classification task is the proper representation of the information it wishes to classify. Classification is needed in domains that are based on uni-modal representations such as text, images, audio, and speech, as well as in domains that are based on multi-modal representations. This paper aims to provide a short review on the developing area of multi-modal representations for classification with emphasis on state-of-the-art systems in this area. Firstly, fundamentals of uni-modal representations are given. Secondly, an overview of multi-modal representations is given. Thirdly, various related systems using multi-modal representations and the datasets used by them are briefly summarized with a comparative summary of these systems.

Full Text