Monaural Speech Separation Using Speaker Embedding From Preliminary Separation

Jaeuk Byun,Jong Won Shin

doi:10.1109/taslp.2021.3101617

Abstract

In speech separation, the identities of the speakers may be an important cue to discriminate speeches in the mixture and separate them better. A few recent researches used the speaker embedding as an additional information, but they often require prior information about the target speaker or used noisy speaker embedding extracted from the mixture signal. In this article, we propose monaural speech separation that utilizes the speaker embedding in the later separator blocks, which is extracted from the intermediate separated results obtained by the early stages of the separator network. The later blocks in the separator networks consisting of repeated blocks such as the fully-convolutional time-domain audio separation network (Conv-TasNet) or the successive downsampling and resampling of multi-resolution features (SuDoRM-RF) are modified to take the speaker information as a form of affine transformation or addition to the original input tensor. The experimental results showed that the proposed methods significantly improved the performances of existing separation systems with a moderate number of additional parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Monaural Speech Separation Using Speaker Embedding From Preliminary Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 7

Similar Papers

MRMI-TTS: Multi-Reference Audios and Mutual Information Driven Zero-Shot Voice Cloning
Yi Ting Chen ... Wanting Li
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Yi Ting Chen, et. al.Yi Ting Chen ... Wanting Li
10 May 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Improving Robustness of One-Shot Voice Conversion with Deep Discriminative Speaker Encoder
Hongqiang Du ... Lei Xie
-
Hongqiang Du, et. al.Hongqiang Du ... Lei Xie
30 Aug 2021
30 Aug 2021

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
Ju-Chieh Chou ... Hung-Yi Lee
-
Ju-Chieh Chou, et. al.Ju-Chieh Chou ... Hung-Yi Lee
02 Sep 2018
02 Sep 2018

Deep Speaker Embedding for Speaker-Targeted Automatic Speech Recognition
Guan-Lin Chao ... Ian Lane
-
Guan-Lin Chao, et. al.Guan-Lin Chao ... Ian Lane
28 Jun 2019
28 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Monaural Speech Separation Using Speaker Embedding From Preliminary Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing