SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Katerina Zmolikova,Tsubasa Ochiai,Jan Cernocky,Lukas Burget,Tomohiro Nakatani,Marc Delcroix,Keisuke Kinoshita

doi:10.1109/jstsp.2019.2922820

Abstract

The processing of speech corrupted by interfering overlapping speakers is one of the challenging problems with regards to today's automatic speech recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most of these approaches tackle the problem as speech separation, i.e., they blindly recover all the speakers from the mixture. In some scenarios, such as smart personal devices, we may however be interested in recovering one target speaker from a mixture. In this paper, we introduce SpeakerBeam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker. Formulating the problem as speaker extraction avoids certain issues such as label permutation and the need to determine the number of speakers in the mixture. With SpeakerBeam, we jointly learn to extract a representation from the adaptation utterance characterizing the target speaker and to use this representation to extract the speaker. We explore several ways to do this, mostly inspired by speaker adaptation in acoustic models for automatic speech recognition. We evaluate the performance on the widely used WSJ0-2mix and WSJ0-3mix datasets, and these datasets modified with more noise or more realistic overlapping patterns. We further analyze the learned behavior by exploring the speaker representations and assessing the effect of the length of the adaptation data. The results show the benefit of including speaker information in the processing and the effectiveness of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Aug 1, 2019
Citations: 205

Similar Papers

Single Channel Target Speaker Extraction and Recognition with Speaker Beam
Marc Delcroix ... Keisuke Kinoshita
-
Marc Delcroix, et. al.Marc Delcroix ... Keisuke Kinoshita
01 Apr 2018
01 Apr 2018

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems
Xuejing Yuan ... Xinqi Ling
ACM Transactions on Privacy and Security | VOL. -
Xuejing Yuan, et. al.Xuejing Yuan ... Xinqi Ling
07 Nov 2024
ACM Transactions on Privacy and Security | VOL. -

Customized deep learning based Turkish automatic speech recognition system supported by language model.
Yasin Görmez
PeerJ Computer Science | VOL. 10
Yasin GörmezYasin Görmez
03 Apr 2024
PeerJ Computer Science | VOL. 10

ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Gaofeng Cheng ... Haoran Miao
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Gaofeng Cheng, et. al.Gaofeng Cheng ... Haoran Miao
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing