Target Speaker Extraction by Fusing Voiceprint Features

Shidan Cheng,Ying Shen,Dongqing Wang

doi:10.3390/app12168152

Abstract

It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation models adopted by these devices only have to extract the target speaker’s speech. A voiceprint, which reflects the speaker’s voice characteristics, provides prior knowledge for the target speech separation. Therefore, how to efficiently integrate voiceprint features into the existing speech separation models to improve their performance for the target speech separation is an interesting problem not fully explored. This paper attempts to solve this issue to some extent and our contributions are as follows. First, two different voiceprint features (i.e., MFCCs and d-vector) are explored in the performance enhancement for three speech separation models. Second, three different feature fusion methods are proposed to efficiently fuse the voiceprint features with the magnitude spectrograms originally used in the speech separation models. Third, a target speech extraction method which utilizes the fused features is proposed for two speaker-independent models. Experiments demonstrate that the speech separation models integrated with voiceprint features using three feature fusion methods can effectively extract the target speaker’s speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Aug 15, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Target Speaker Extraction by Fusing Voiceprint Features

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam
Marc Delcroix ... Shoko Araki
-
Marc Delcroix, et. al.Marc Delcroix ... Shoko Araki
01 May 2020
01 May 2020

Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu ... Cornelius Weber
-
Leyuan Qu, et. al.Leyuan Qu ... Cornelius Weber
25 Oct 2020
25 Oct 2020

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Katerina Zmolikova ... Tomohiro Nakatani
IEEE Journal of Selected Topics in Signal Processing | VOL. 13
Katerina Zmolikova, et. al.Katerina Zmolikova ... Tomohiro Nakatani
01 Aug 2019
IEEE Journal of Selected Topics in Signal Processing | VOL. 13

Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation
Jan Svec ... Jan Honza Cernocky
-
Jan Svec, et. al.Jan Svec ... Jan Honza Cernocky
05 Sep 2022
05 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Target Speaker Extraction by Fusing Voiceprint Features

Abstract

Talk to us

Similar Papers

More From: Applied Sciences