On the modelling of multi-modal data using redundant dictionaries

Gianluca Monaci

doi:10.5075/epfl-thesis-3741

Abstract

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving sets of mutually related signals, called multi-modal signals. The simultaneous processing of multi-modal data can in fact reveal information that is otherwise hidden when considering the different modalities independently. This dissertation deals with the modelling and the analysis of natural multi-modal signals. The challenge consists in representing sets of data streams of different nature, like audio-video sequences, that are interrelated in some complex and unknown manner, in such a way that useful information shared by the different data modalities can be extracted and intuitively used. In this sense signal representation have to make an effort to model the structural properties of the observed phenomenon, so that data are expressed in terms of few, meaningful elements. In fact, if information can be represented using only few components, this means that such components capture its salient characteristics. In order to efficiently represent multi-modal data, we advocate the use of sparse signal decompositions over redundant sets of functions (called dictionaries). In this thesis we consider both application-related and theoretical aspects of multi-modal signal processing. We propose two models for multi-modal signals that explain multi-modal phenomena in terms of temporally-proximal events present in the different modalities. A first simple model is inspired by human perception of multi-modal stimuli and it relies on the representation of the different data streams as sparse sums of dictionary elements. This type of representation allows to intuitively define meaningful events present in the different modalities and to discover correlated multi-modal patterns. Taking inspiration by this first model, we introduce a representational framework for multi-modal data based on their sparse decomposition over dictionaries of multi-modal functions. Instead of separately decompose each modality over a dictionary and seek for correlations between the extracted patterns, we impose some correlation between modalities at the model level. Since such correlations are difficult to formalize, we propose as well a method to learn dictionaries of synchronous multi-modal basis elements. Concerning the applications presented in this dissertation, we tackle two major audiovisual fusion problems, that are audiovisual source localization and separation. Although many of the ideas developed in this work are completely general, we consider this field since it is the one that presents the vastest possibilities of application for this research. The theoretical frameworks developed throughout the thesis are used to localize, separate and extract audio-video sources in audiovisual sequences. Algorithms for cross-modal source localization and blind audiovisual source separation are tested on challenging real-world multimedia sequences. Experiments show that the proposed approach leads to promising results for several newly designed multi-modal signal processing algorithms and that a careful modelling of data structural properties can convey interesting, useful information to understand complex multi-modal phenomena.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the modelling of multi-modal data using redundant dictionaries

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Introduction. Editorial on ‘Signal processing in vital rhythms and signs’
Pablo Laguna ... Leif Sörnmo
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences | VOL. 367
Pablo Laguna, et. al.Pablo Laguna ... Leif Sörnmo
20 Oct 2008
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences | VOL. 367

Learning Multimodal Dictionaries
G Monaci ... P Jost
IEEE Transactions on Image Processing | VOL. 16
G Monaci, et. al.G Monaci ... P Jost
01 Sep 2007
IEEE Transactions on Image Processing | VOL. 16

Machine learning via multimodal signal processing
K Kokkinidis ... A Tsagaris
-
K Kokkinidis, et. al.K Kokkinidis ... A Tsagaris
01 May 2017
01 May 2017

A Brain-Inspired In-Memory Computing System for Neuronal Communication via Memristive Circuits
Xiaoyue Ji ... Chun Sing Lai
IEEE Communications Magazine | VOL. 60
Xiaoyue Ji, et. al.Xiaoyue Ji ... Chun Sing Lai
01 Jan 2021
IEEE Communications Magazine | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the modelling of multi-modal data using redundant dictionaries

Abstract

Talk to us

Similar Papers