I-vector Representation Research Articles

This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zero-order and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet besides MFCC, we believe that phonetic information makes another direction that can benefit the system performance. Our contribution in this paper lies in integrating phonetic information into the i-vector representation by several extensions, forming a more generalized i-vector framework. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained GMM components to phonetic phonemes, trigrams and tandem feature trained GMM components, using phoneme posterior probabilities. Second, given the zero-order statistics (posterior probabilities on tokens), the feature used to calculate the first-order statistics is also extended from MFCC to tandem feature, and is not necessarily the same feature employed by the tokenizer. Third, the zero-order and first-order statistics vectors are then concatenated and represented by the simplified supervised i-vector approach followed by the standard Probabilistic Linear Discriminant Analysis (PLDA) back-end. We study different token and feature combinations, and we show that the feature level fusion of acoustic level MFCC features and phonetic level tandem features with GMM based i-vector representation achieves the best performance for text independent speaker verification. Furthermore, we demonstrate that the phonetic level phoneme constraints introduced by the tandem features help the text dependent speaker verification system to reject wrong password trials and improve the performance dramatically. Experimental results are reported on the NIST SRE 2010 common condition 5 female part task and the RSR 2015 part 1 female part task for text independent and text dependent speaker verification, respectively. For the text independent speaker verification task, the proposed generalized i-vector representation outperforms the i-vector baseline by relatively 53 % in terms of equal error rate (EER) and norm minDCF values. For the text dependent speaker verification task, our proposed approach also reduced the EER significantly from 23 % to 90 % relatively for different types of trials.

The sparse representation classification (SRC) has attracted the attention of many signal processing domains in past few years. Recently, it has been successfully explored for the speaker recognition task with Gaussian mixture model (GMM) mean supervectors which are typically of the order of tens of thousands as speaker representations. As a result of this, the complexity of such systems become very high. With the use of the state-of-the-art i-vector representations, the dimension of GMM mean supervectors can be reduced effectively. But the i-vector approach involves a high dimensional data projection matrix which is learned using the factor analysis approach over huge amount of data from a large number of speakers. Also, the estimation of i-vector for a given utterance involves a computationally complex procedure. Motivated by these facts, we explore the use of data-independent projection approaches for reducing the dimensionality of GMM mean supervectors. The data-independent projection methods studied in this work include a normal random projection and two kinds of sparse random projections. The study is performed on SRC-based speaker identification using the NIST SRE 2005 dataset which includes channel matched and mismatched conditions. We find that the use of data-independent random projections for the dimensionality reduction of the supervectors results in only 3 % absolute loss in performance compared to that of the data-dependent (i-vector) approach. It is highlighted that with the use of highly sparse random projection matrices having $$\pm $$ ± 1 as non-zero coefficients, a significant reduction in computational complexity is achieved in finding the projections. Further, as these matrices do not require floating point representations, their storage requirement is also very small compared to that of the data-dependent or the normal random projection matrices. These reduced complexity sparse random projections would be of interest in context of the speaker recognition applications implemented on platforms having low computational power.

I-vector Representation Research Articles

Related Topics

Articles published on I-vector Representation

DNN and i-vector combined method for speaker recognition on multi-variability environments

Estimating Uniqueness of I-Vector-Based Representation of Human Voice

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Task-Driven Variability Model for Speaker Verification

Voice verification and identification using i-vector representation

HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification

Regularization of neural network model with distance metric learning for i-vector based spoken language identification

Speaker diarization system using HXLPS and deep neural network

I-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition

Improved i-Vector Representation for Speaker Diarization

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

Rapid Language Identification

A Study of Acoustic Features for Emotional Speaker Recognition in I-Vector Representation

Low-complexity speaker verification with decimated supervector representations

Deep bottleneck features for spoken language identification.

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

I‐vector representation based on bottleneck features for language identification

Factor analysis of auto-associative neural networks with application in speaker verification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

I-vector Representation Research Articles

Related Topics

Articles published on I-vector Representation

DNN and i-vector combined method for speaker recognition on multi-variability environments

Estimating Uniqueness of I-Vector-Based Representation of Human Voice

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Task-Driven Variability Model for Speaker Verification

Voice verification and identification using i-vector representation

HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification

Regularization of neural network model with distance metric learning for i-vector based spoken language identification

Speaker diarization system using HXLPS and deep neural network

I-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition

Improved i-Vector Representation for Speaker Diarization

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

Rapid Language Identification

A Study of Acoustic Features for Emotional Speaker Recognition in I-Vector Representation

Low-complexity speaker verification with decimated supervector representations

Deep bottleneck features for spoken language identification.

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

I‐vector representation based on bottleneck features for language identification

Factor analysis of auto-associative neural networks with application in speaker verification.