Blind Audio-Visual Localization and Separation via Low-Rank and Sparsity.

Jie Pu,Jie Shen,Maja Pantic,Yannis Panagakis,Stavros Petridis

doi:10.1109/tcyb.2018.2883607

Abstract

The ability to localize visual objects that are associated with an audio source and at the same time to separate the audio signal is a cornerstone in audio-visual signal-processing applications. However, available methods mainly focus on localizing only the visual objects, without audio separation abilities. Besides that, these methods often rely on either laborious preprocessing steps to segment video frames into semantic regions, or additional supervisions to guide their localization. In this paper, we aim to address the problem of visual source localization and audio separation in an unsupervised manner and avoid all preprocessing or post-processing steps. To this end, we devise a novel structured matrix decomposition method that decomposes the data matrix of each modality as a superposition of three terms: 1) a low-rank matrix capturing the background information; 2) a sparse matrix capturing the correlated components among the two modalities and, hence, uncovering the sound source in visual modality and the associated sound in audio modality; and 3) a third sparse matrix accounting for uncorrelated components, such as distracting objects in visual modality and irrelevant sound in audio modality. The generality of the proposed method is demonstrated by applying it onto three applications, namely: 1) visual localization of a sound source; 2) visually assisted audio separation; and 3) active speaker detection. Experimental results indicate the effectiveness of the proposed method on these application domains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Blind Audio-Visual Localization and Separation via Low-Rank and Sparsity.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics

Lead the way for us

Journal: IEEE transactions on cybernetics	Publication Date: Dec 13, 2018
Citations: 45

Similar Papers

Audio-visual object localization and separation using low-rank and sparsity
Jie Pu ... Yannis Panagakis
-
Jie Pu, et. al.Jie Pu ... Yannis Panagakis
01 Mar 2017
01 Mar 2017

Audio-video fusion strategies for active speaker detection in meetings
Lionel Pibre ... Isabelle Ferrané
Multimedia Tools and Applications | VOL. 82
Lionel Pibre, et. al.Lionel Pibre ... Isabelle Ferrané
28 Sep 2022
Multimedia Tools and Applications | VOL. 82

Active Speaker Detection using audio-visual sensor array
Jatin Kheradiya ... Sandeep Reddy C
-
Jatin Kheradiya, et. al.Jatin Kheradiya ... Sandeep Reddy C
01 Dec 2014
01 Dec 2014

Human motion intention recognition method with visual, audio, and surface electromyography modalities for a mechanical hand in different environments
Feiyun Xiao ... Yong Wang
Biomedical Signal Processing and Control | VOL. 79
Feiyun Xiao, et. al.Feiyun Xiao ... Yong Wang
13 Aug 2022
Biomedical Signal Processing and Control | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blind Audio-Visual Localization and Separation via Low-Rank and Sparsity.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics