Abstract

Gaussian mixture models (GMMs) are recently employed to provide a robust technique for speaker identification. The determination of the appropriate number of Gaussian components in amodel for adequate speaker representation is a crucial but difficult problem. This number is in fact speaker dependent. Therefore, assuming a fixed number of Gaussian components for all speakers is not justified. In this paper, we develop a procedure for roughly estimating the maximum possible model order above which the estimation of model parameters becomes unreliable. In addition, a theoretical measure, namely, a goodness of fit (GOF) measure is derived and utilized in estimating the number of Gaussian components needed to characterize different speakers. The estimation is carried out by exploiting the distribution of the training data for each speaker. Experimental results indicate that the proposed technique provides results comparable to other well-known model selection criteria like the minimum description length (MDL) and the Akaike information criterion (AIC).

Highlights

  • Speech signal is believed to be among the fast methods to transmit information between human and machine

  • The emphasis is on recognizing words and phrases in a spoken utterance, while speaker recognition is concerned with extracting the identity of the person speaking the utterance

  • This section compares the performance of the proposed algorithm to two well-known model order selection criteria, minimum description length (MDL) [14] and Akaike information criterion (AIC) [15]

Read more

Summary

INTRODUCTION

Speech signal is believed to be among the fast methods to transmit information between human and machine. Open-set speaker identification includes the additional possibility where a speaker may be outside the given set of speakers [1] Another distinguishing feature of speaker recognition is whether it is text-dependent or text-independent. Over the past several years, Gaussian mixture models (GMMs) have become the dominant approach for modeling in text-independent speaker recognition applications. This is evidenced by the numerous research works on the use of GMMs for speaker identification and verification tasks [3, 4, 5, 6]. GMMs are shown to efficiently represent speakerdependent acoustic features In this method, each speaker is represented by a single model.

GAUSSIAN MIXTURE MODEL
Model description
Parameter estimation and model training
THE PROPOSED TECHNIQUE
GOF measure
Finding an upper bound for the model order
GOF-based training algorithm
Database development
Feature extraction
Performance evaluation
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call