NTIMIT Database Research Articles

This paper presents a new method to improve features derived from filtering in autocorrelation domain, which are called relative autocorrelation sequence mel-frequency cepstral coefficients (RAS-MFCCs), one of the successful features in autocorrelation domain for noise-robust speaker recognition. The RAS-MFCCs are derived by applying temporal filtering to autocorrelation sequences under the assumption that corrupting noise is stationary. However, the use of only the filtered sequences could cause performance degradation due to the use of restricted information, and the assumption that noise is stationary might result in leaving non-stationary noise components in filtered autocorrelation sequences in real environments. To compensate for the restricted information, we propose a multi-streaming feature extraction that uses autocorrelation sequences as well as temporally filtered autocorrelation sequences for feature extraction. Furthermore, a hybrid feature representation, in which the multi-streaming feature extraction and the sub-band feature recombination are combined, is proposed to reduce the noise effects of autocorrelation sequences and the residual-noise effects of temporally filtered autocorrelation sequences. To evaluate the effectiveness of the proposed hybrid speaker recognition system in noisy conditions, we use the TIMIT database and the NTIMIT database. Experiments on the TIMIT database prove the effectiveness of the proposed hybrid method by reducing errors up to 26% and 14% over the conventional RAS-MFCCs in speaker identification and verification, respectively. On the NTIMIT database, the proposed hybrid feature representation provides error reduction of 24% and 18% over the conventional RAS-MFCCs for speaker identification and verification.

We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This techniques had been shown to be robust for speech recognition when a narrow band noise degradation occur. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600 Hz) and the high-frequency subbands (over 3000 Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system arechitectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed.

NTIMIT Database Research Articles

Articles published on NTIMIT Database

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Enhancement of Speech Recognition System by neural network approaches of Clustering

Non-symmetric time resolution for spectral feature trajectories

Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition

Hidden data transmission in mixed excitation linear prediction coded speech using quantisation index modulation

Robust speaker recognition based on filtering in autocorrelation domain and sub-band feature recombination

Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations

Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation

Combining classifier decisions for robust speaker identification

Speaker-specific mapping for text-independent speaker recognition

A hybrid syllable recognition system based on vowel spotting

Subband architecture for automatic speaker recognition

Modeling of the glottal flow derivative waveform with application to speaker identification

Speaker identification and verification using Gaussian mixture speaker models

Large population speaker identification using clean and telephone speech

Energy onset times for speaker identification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

NTIMIT Database Research Articles

Articles published on NTIMIT Database

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Enhancement of Speech Recognition System by neural network approaches of Clustering

Non-symmetric time resolution for spectral feature trajectories

Spectral amplitude nonlinearities for improved noise robustness of spectral features for use in automatic speech recognition

Hidden data transmission in mixed excitation linear prediction coded speech using quantisation index modulation

Robust speaker recognition based on filtering in autocorrelation domain and sub-band feature recombination

Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations

Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation

Combining classifier decisions for robust speaker identification

Speaker-specific mapping for text-independent speaker recognition

A hybrid syllable recognition system based on vowel spotting

Subband architecture for automatic speaker recognition

Modeling of the glottal flow derivative waveform with application to speaker identification

Speaker identification and verification using Gaussian mixture speaker models

Large population speaker identification using clean and telephone speech

Energy onset times for speaker identification