Speaker recognition in noisy environments using auxiliary information and Bayesian networks

Mijail Arciénega

doi:10.5075/epfl-thesis-3644

Abstract

Speaker recognition systems achieve acceptable performance in controlled laboratory conditions. However, in real-life environments, the performance of a speaker recognition system degrades drastically, the principal cause being the mismatch that exists between the testing and the training recording conditions. Indeed, the degradations introduced by the background noise as well as the distortions produced by the transmission channel are the two main factors for creating this mismatch between testing and training recordings. In spite of the major advances in the speaker recognition field, no optimal solution has been found yet for coping with this problem. This thesis proposes new methods for speaker recognition systems that make use of auxiliary information in order to reduce the influence of background noise and transmission channel distortions. For this purpose, statistical models capable of taking into account several sources of information in a unified framework are provided. While most state-of-the-art speaker recognition systems use spectral envelope features alone, the use of other features can complete the information about the speaker's individuality, but also give information about the conditions under which the testing process takes place. Both informations can help to improve the performance of the speaker recognition system. In this thesis, we focus on three auxiliary sources of information: the pitch, the voicing status and the reliability status of the spectral envelope features. These auxiliary features are used together with the spectral envelope features. The algorithms to efficiently extract the pitch and the voicing status from noisy telephone quality speech are developed. The algorithm for extracting the reliability status of spectral envelope features is also provided. Two new statistical modeling approaches for handling auxiliary sources of informations are proposed: the state-dependent transitions (SDT) model and the state-dependent states (SDS) model. Both models take into account the temporal dependencies between features of a given source of information, and also the dependencies between features that belong to different sources of information. Speaker identification experiments were conducted for evaluating the SDT modeling approach. Experiments were also performed for evaluating the novel pitch-dependent GMMs system, which is based on the SDS modeling approach. The results of all these experiments show that the modeling techniques proposed in this thesis are capable of capturing the key characteristics of the speech features and their dependencies. The concept of conditional independence and the use of conditional models are important in SDT and SDS models. One of the major drawbacks of these models is that the dependencies between features are fixed. To eliminate this drawback, a more flexible approach using Bayesian networks is introduced. Bayesian networks, have the ability to manage the dependencies between features via conditional models and to handle the relationships of conditional independence between features. We show in this thesis how Bayesian networks can complete and substitute SDS and SDT models. Two Bayesian network based systems are presented for handling auxiliary information in speaker recognition. The first one uses the pitch, the voicing status and the spectral envelope features, the second completes the first one by adding the reliability status to the set of features. Both proposed systems were compared to a GMM-UBM (Gaussian mixture model - universal background model) baseline system. Experiments were performed for evaluating the proposed approaches in noisy conditions as well as when using different transmission channels for testing and training the speaker models. The results obtained show that the Bayesian network based systems using auxiliary information outperform the classical GMM-UBM system which uses only spectral envelope features. The Bayesian network based systems proposed in this thesis, effectively reduce the influence of noise and transmission channel mismatch.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speaker recognition in noisy environments using auxiliary information and Bayesian networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Wavelet based dynamic Mel Frequency Cepstral Coefficients (MFCC) and block truncation techniques for efficient speaker identification under narrowband noise conditions
...
International Journal of the Physical Sciences | VOL. 8
, et. al. ...
23 Sep 2013
International Journal of the Physical Sciences | VOL. 8

A Bayesian Network Approach for Combining Pitch and Reliable Spectral Envelope Features for Robust Speaker Verification
Mijail Arcienega ... Andrzej Drygajlo
-
Mijail Arcienega, et. al.Mijail Arcienega ... Andrzej Drygajlo
01 Jan 2003
01 Jan 2003

Influence of G729 Speech Coding on Automatic Speaker Recognition in VoIP Applications
Dalila Yessad ... Mohamed Debyeche
-
Dalila Yessad, et. al.Dalila Yessad ... Mohamed Debyeche
10 Dec 2011
10 Dec 2011

Cluster adaptive training weights as features in SVM-based speaker verification
Hao Yang ... Xianyu Zhao
-
Hao Yang, et. al.Hao Yang ... Xianyu Zhao
27 Aug 2007
27 Aug 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker recognition in noisy environments using auxiliary information and Bayesian networks

Abstract

Talk to us

Similar Papers