PLDA inspired Siamese networks for speaker verification

Shreyas Ramoji,Prashant Krishnan,Sriram Ganapathy

doi:10.1016/j.csl.2022.101383

Abstract

The deep learning methodologies in state-of-the-art speaker recognition systems are predominantly limited to the extraction of recording level embeddings. This is usually followed by generative modeling of the embeddings to output the verification score. In this paper, we explore a fully neural approach where the neural model outputs the verification score directly, given the acoustic feature inputs. This model, termed as Siamese neural network (SiamNN), combines the embedding extraction and back-end modeling into a single processing pipeline. The back-end modeling is achieved using a neural approach to PLDA modeling, called neural probabilistic linear discriminant analysis (NPLDA). In the NPLDA model, the verification score is computed as a discriminative similarity function. The development of the single neural SiamNN model allows the joint optimization of all the modules using a verification cost. Several speaker recognition experiments are performed using SITW, VOiCES, and NIST SRE datasets where the proposed SiamNN model is shown to significantly improve over the state-of-art x-vector PLDA baseline system (relative improvements of up to 35% in the primary cost metric). We also provide a detailed analysis of the influence of hyper-parameters, choice of loss functions, and data sampling strategies for training the model. In particular, we highlight that the proposed soft detection cost function based optimization improves over other loss functions considered.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PLDA inspired Siamese networks for speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Apr 19, 2022
Citations: 3

Similar Papers

A Study on Bias and Fairness in Deep Speaker Recognition
Amirhossein Hajavi ... Ali Etemad
-
Amirhossein Hajavi, et. al.Amirhossein Hajavi ... Ali Etemad
04 Jun 2023
04 Jun 2023

NPLDA: A Deep Neural PLDA Model for Speaker Verification
Shreyas Ramoji ... Sriram Ganapathy
-
Shreyas Ramoji, et. al.Shreyas Ramoji ... Sriram Ganapathy
01 Nov 2020
01 Nov 2020

Robust Feature Extraction Using Modulation Filtering of Autoregressive Models
Sriram Ganapathy ... Hynek Hermansky
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Sriram Ganapathy, et. al.Sriram Ganapathy ... Hynek Hermansky
01 Aug 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset
Arundhati Niwatkar ... Yuvraj Kanse
ICST Transactions on Scalable Information Systems | VOL. -
Arundhati Niwatkar, et. al.Arundhati Niwatkar ... Yuvraj Kanse
09 Apr 2024
ICST Transactions on Scalable Information Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PLDA inspired Siamese networks for speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language