Reduction In Equal Error Rate Research Articles

AbstractIn the i‐vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i‐vectors. PLDA defines an i‐vector subspace that compensates the unwanted variability and helps to discriminate among speaker‐phrase pairs. The channel or session variability manifested in i‐vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i‐vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy‐based system. A modified fuzzy‐clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i‐vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace‐specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i‐vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i‐vectors using mixture model technique. As a consequence, vocal source features and i‐vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker‐phrase classes, which otherwise are not easily discriminable based on i‐vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i‐vector/PLDA system.

Multimodal systems are a workaround to enhance the robustness and effectiveness of biometric systems. A proper multimodal dataset is of the utmost importance to build such systems. The literature presents some multimodal datasets, although, to the best of our knowledge, there are no previous studies combining face, iris/eye, and vital signals such as the Electrocardiogram (ECG). Moreover, there is no methodology to guide the construction and evaluation of a chimeric dataset. Taking that fact into account, we propose to create a chimeric dataset from three modalities in this work: ECG, eye, and face. Based on the Doddington Zoo criteria, we also propose a generic and systematic protocol imposing constraints for the creation of homogeneous chimeric individuals, which allow us to perform a fair and reproducible benchmark. Moreover, we have proposed a multimodal approach for these modalities based on state-of-the-art deep representations built by convolutional neural networks. We conduct the experiments in the open-world verification mode and on two different scenarios (intra-session and inter-session), using three modalities from two datasets: CYBHi (ECG) and FRGC (eye and face). Our multimodal approach achieves impressive decidability of 7.20 ± 0.18, yielding an almost perfect verification system (i.e., Equal Error Rate (EER) of 0.20% ± 0.06) on the intra-session scenario with unknown data. On the inter-session scenario, we achieve a decidability of 7.78 ± 0.78 and an EER of 0.06% ± 0.06. In summary, these figures represent a gain of over 28% in decidability and a reduction over 11% of the EER on the intra-session scenario for unknown data compared to the best-known unimodal approach. Besides, we achieve an improvement greater than 22% in decidability and an EER reduction over 6% in the inter-session scenario.

Reduction In Equal Error Rate Research Articles

Related Topics

Articles published on Reduction In Equal Error Rate

The Proposal of Countermeasures for DeepFake Voices on Social Media Considering Waveform and Text Embedding

Increasing the Robustness of i-vectors with Model Compensated First Order Statistics

Performance Analysis of Hybrid RR Algorithm for Anomaly Detection in Streaming Data

New Acoustic Features for Synthetic and Replay Spoofing Attack Detection

Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask

Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

ChimericalDataset Creation Protocol Based on Doddington Zoo: A Biometric Application with Face, Eye, and ECG.

Addressing Text-Dependent Speaker Verification Using Singing Speech

MakeUpMirror: mirroring make‐ups and verifying faces post make‐up

I-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

A novel scores fusion approach applied on speaker verification under noisy environments

Selection of Heart-Biometric Templates for Fusion

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Sentence‐HMM state‐based i‐vector/PLDA modelling for improved performance in text dependent single utterance speaker verification

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reduction In Equal Error Rate Research Articles

Related Topics

Articles published on Reduction In Equal Error Rate

The Proposal of Countermeasures for DeepFake Voices on Social Media Considering Waveform and Text Embedding

Increasing the Robustness of i-vectors with Model Compensated First Order Statistics

Performance Analysis of Hybrid RR Algorithm for Anomaly Detection in Streaming Data

New Acoustic Features for Synthetic and Replay Spoofing Attack Detection

Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask

Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

ChimericalDataset Creation Protocol Based on Doddington Zoo: A Biometric Application with Face, Eye, and ECG.

Addressing Text-Dependent Speaker Verification Using Singing Speech

MakeUpMirror: mirroring make‐ups and verifying faces post make‐up

I-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

A novel scores fusion approach applied on speaker verification under noisy environments

Selection of Heart-Biometric Templates for Fusion

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Sentence‐HMM state‐based i‐vector/PLDA modelling for improved performance in text dependent single utterance speaker verification

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes