Audio-Visual Based Online Multi-Source Separation

Jonah Ong,Changbeom Shim,Diluka Moratuwage,Ba-Ngu Vo,Ba Tuong Vo,Sven Nordholm

doi:10.1109/taslp.2022.3156758

Abstract

Meeting or conference assistance is a popular application that typically requires compact configurations of co-located audio and visual sensors. This paper proposes a novel solution for online separation of an unknown and time-varying number of moving sources using only a single microphone array co-located with a single visual device. The approach exploits the complementary nature of simultaneous audio and visual measurements, accomplished by a model-centric 3-stage process of detection, tracking, and (spatial) filtering, which performs separation in a block-wise or recursive fashion. Fusing the measurements requires solving the multi-modal space-time permutation problem, since the audio and visual measurements reside in different observation spaces, but also are unidentified or unlabeled (with respect to the unknown and time-varying number of sources), and are subject to noise, extraneous measurements and missing measurements. A labeled random finite set tracking filter is applied to resolve the permutation problem and recursively estimate the source identities and trajectories. A time-varying set of generalized side-lobe cancellers is constructed based on the tracking estimates to perform online separation. Evaluations are undertaken with live human speakers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio-Visual Based Online Multi-Source Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2022
Citations: 2

Similar Papers

Random set particle filter for bearings-only multitarget tracking
Matti Vihola
-
Matti ViholaMatti Vihola
25 May 2005
25 May 2005

Large-Scale, Real-Time 3D Scene Reconstruction Using Visual and IMU Sensors
Changdi Li ... Shumin Fei
IEEE Sensors Journal | VOL. 20
Changdi Li, et. al.Changdi Li ... Shumin Fei
15 May 2020
IEEE Sensors Journal | VOL. 20

Visual H2 sensor for monitoring biodegradation of magnesium implants in vivo.
Daoli Zhao ... Daeho Hong
Acta Biomaterialia | VOL. 45
Daoli Zhao, et. al.Daoli Zhao ... Daeho Hong
28 Aug 2016
Acta Biomaterialia | VOL. 45

A high precision collaborative vision measurement of gear chamfering profile
Chunming Cai ... Zengpu Xu
-
Chunming Cai, et. al.Chunming Cai ... Zengpu Xu
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio-Visual Based Online Multi-Source Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing