Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

Woo Hyun Kang,Nam Soo Kim,Min Hyun Han,Sung Hwan Mun

doi:10.1109/access.2020.3012893

Woo Hyun Kang, Nam Soo Kim + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3012893

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 54	License type: CC BY 4.0

Affiliation: Seoul National University

Abstract

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

Highlights

Speaker verification is the task of verifying the claimed speaker identity based on the given speech samples and has become a key technology for personal authentication in many commercial applications, forensics and law enforcement [1]
The results demonstrate that the proposed joint factor embedding (JFE) is composed of a simple x-vector-like network, it can provide embedding with higher speaker discriminative information than the systems with more complicated architecture
In this paper, a novel approach for extracting an embedding vector robust to variability caused by nuisance attributes for speaker verification is proposed

Summary

INTRODUCTION

Speaker verification is the task of verifying the claimed speaker identity based on the given speech samples and has become a key technology for personal authentication in many commercial applications, forensics and law enforcement [1]. For extracting a channel-robust embedding for speaker verification, Lmain would be the speaker cross-entropy Lspkr defined in (4), and Lsub would be the channel cross-entropy which can be computed as follows: M. m=1 where M is the number of different channels (e.g., recording devices) in the training set, rm and rm(ω) are the mth component of the one-hot channel label r and channel classifier’s softmax output r(ω), respectively. M=1 where M is the number of different channels (e.g., recording devices) in the training set, rm and rm(ω) are the mth component of the one-hot channel label r and channel classifier’s softmax output r(ω), respectively Another way to achieve disentanglement is by training the embedding network and the subtask network in a competitive manner via adversarial training [22]. Once the embedding vectors are extracted, both ωspkr and ωnuis are fed into the speaker and nuisance classification networks

TRAINING FOR JOINT FACTOR EMBEDDING

2) EXPERIMENTAL SETUP

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences.
Martin Buttenschoen ... Garrett M Morris
Chemical Science | VOL. 15
Martin Buttenschoen, et. al.Martin Buttenschoen ... Garrett M Morris
01 Jan 2024
Chemical Science | VOL. 15

A Comparative Analysis of Traditional and Deep Learning-Based Anomaly Detection Methods for Streaming Data
Mohsin Munir ... Andreas Dengel
-
Mohsin Munir, et. al.Mohsin Munir ... Andreas Dengel
01 Dec 2019
01 Dec 2019

Visible and Infrared Image Fusion Using Deep Learning.
Xingchen Zhang ... Yiannis Demiris
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. PP
Xingchen Zhang, et. al.Xingchen Zhang ... Yiannis Demiris
01 Aug 2023
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. PP

Exploiting Graph and Geodesic Distance Constraint for Deep Learning-Based Visual Odometry
Xu Fang ... Qing Li
Remote Sensing | VOL. 14
Xu Fang, et. al.Xu Fang ... Qing Li
12 Apr 2022
Remote Sensing | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access