A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Ju-Ho Kim,Jee-Weon Jung,Ha-Jin Yu,Hye-Jin Shim

doi:10.3390/app12010076

Abstract

The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.

Highlights

Academic Editor: ArcangeloSpeaker verification (SV) is the task of authenticating whether a speaker of an unknown input utterance matches the target speaker, and it is widely used in applications, such as voice assistant systems [1,2]
The baseline system is a RawNet2-based model with several modifications, and reported improved performances based on the equal error rate (EER) compared to the original RawNet2
This result indicates that the supervised mean teacher (MT) framework proposed in this study can improve the generalization of SV system

Summary

Introduction

Academic Editor: ArcangeloSpeaker verification (SV) is the task of authenticating whether a speaker of an unknown input utterance matches the target speaker, and it is widely used in applications, such as voice assistant systems [1,2]. Recent SV systems are primarily studied as an open-set scenario that tests using the utterances of speakers not seen in the training phase, requiring strong generalization [2,3]. Considering these characteristics of SV, many researchers have aimed to extract discriminative speaker embeddings from utterances by exploiting deep neural networks (DNNs). We noted from the results of a study that solely averaging DNN parameters after each step in the training phase can converge to better local minima [4] This technique is called “temporal averaging”; the temporal averaging of weights can lead to more stable and accurate results than the final weights when the training has been completed. The MT is the temporal averaging model of the student network and can generate a relatively

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 22, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

On robustness of speech based biometric systems against voice conversion attack
Monisankha Pal ... Goutam Saha
Applied Soft Computing | VOL. 30
Monisankha Pal, et. al.Monisankha Pal ... Goutam Saha
07 Feb 2015
Applied Soft Computing | VOL. 30

Performance of I-vector speaker verification and the detection of synthetic speech
Richard D Mcclanahan ... Phillip L De Leon
-
Richard D Mcclanahan, et. al.Richard D Mcclanahan ... Phillip L De Leon
01 May 2014
01 May 2014

Using LSF features for speaker verification in noise
Pujita Raman ... A A Louis Beex
-
Pujita Raman, et. al.Pujita Raman ... A A Louis Beex
01 Dec 2015
01 Dec 2015

Neural network based speaker classification and verification systems with enhanced features
Zhenhao Ge ... Ananth N Iyer
-
Zhenhao Ge, et. al.Zhenhao Ge ... Ananth N Iyer
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences