A technique for adjusting Gaussian mixture model weights that improves speaker identification performance in the presence of phonemic train/test mismatch.

Jack Mclaughlin,Lane Owsley

doi:10.1121/1.3587917

Abstract

Speaker identification is complicated by cases where training material is phonemically deficient. Misclassifications can result either because subsequent test material from that speaker contains primarily the phonemes missing from the training data or because that test material is phonemically most consistent with another talker’s model. This situation can arise in any dialog where, for reasons of brevity and clarity, conventions must be imposed on phraseology. We present here a technique for detecting phonemic deficiencies in a speaker model, and then correcting that model to partially compensate for the biased training data. This technique relies upon a specially constructed universal background model (UBM) from which speaker models are adapted. This UBM is formed by weighting several dozen phoneme GMMs using EM training. As a result, each Gaussian component of the UBM (and of the resulting speaker models) corresponds to a specific phoneme. Analysis of the speaker model weights reveals whether the training data had the typical phonemic variety found in ordinary speech, and if it did not, the weights are adjusted. Using a specially designed corpus created from the TIMIT utterances, we show that this reweighting technique improves performance over non-reweighted models. Results are also given for the Air Traffic Control Corpus.

Full Text