A Study on Online Source Extraction in the Presence of Changing Speaker Positions

Jens Heitkaemper,Thomas Fehér,Michael Freitag,Reinhold Haeb-Umbach

doi:10.1007/978-3-030-31372-2_17

Abstract

Multi-talker speech and moving speakers still pose a significant challenge to automatic speech recognition systems. Assuming an enrollment utterance of the target speakeris available, the so-called SpeakerBeam concept has been recently proposed to extract the target speaker from a speech mixture. If multi-channel input is available, spatial properties of the speaker can be exploited to support the source extraction. In this contribution we investigate different approaches to exploit such spatial information. In particular, we are interested in the question, how useful this information is if the target speaker changes his/her position. To this end, we present a SpeakerBeam-based source extraction network that is adapted to work on moving speakers by recursively updating the beamformer coefficients. Experimental results are presented on two data sets, one with artificially created room impulse responses, and one with real room impulse responses and noise recorded in a conference room. Interestingly, spatial features turn out to be advantageous even if the speaker position changes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Study on Online Source Extraction in the Presence of Changing Speaker Positions

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks
Yang Sun ... Syed Mohsen Naqvi
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27
Yang Sun, et. al.Yang Sun ... Syed Mohsen Naqvi
01 Jan 2019
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27

Enhanced Time-Frequency Masking by Using Neural Networks for Monaural Source Separation in Reverberant Room Environments
Yang Sun ... Syed Mohsen Naqvi
-
Yang Sun, et. al.Yang Sun ... Syed Mohsen Naqvi
01 Sep 2018
01 Sep 2018

TS-RIR: Translated Synthetic Room Impulse Responses for Speech Augmentation
Anton Ratnarajah ... Zhenyu Tang
-
Anton Ratnarajah, et. al.Anton Ratnarajah ... Zhenyu Tang
13 Dec 2021
13 Dec 2021

Sequentially Trained DNNs Based Monaural Source Separation in Real Room Environments
Yi Li ... Syed Mohsen Naqvi
-
Yi Li, et. al.Yi Li ... Syed Mohsen Naqvi
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study on Online Source Extraction in the Presence of Changing Speaker Positions

Abstract

Talk to us

Similar Papers