Abstract

A speaker identification system is provided that constructs speaker models using a discriminant analysis technique where the data in each class is modeled by Gaussian mixtures. The speaker identification method and apparatus determines the identity of a speaker, as one of a small group, based on a sentence-length password utterance. A speaker's utterance is received and a sequence of a first set of feature vectors are computed based on the received utterance. The first set of feature vectors are then transformed into a second set of feature vectors using transformations specific to a particular segmentation unit, and likelihood scores of the second set of feature vectors are computed using speaker models trained using mixture discriminant analysis. The likelihood scores are then combined to determine an utterance score and the speaker's identity is validated based on the utterance score. The speaker identification method and apparatus also includes training and enrollment phases. In the enrollment phase the speaker's password utterance is received multiple times. A transcription of the password utterance as a sequence of phones is obtained, and the phone string is stored in a database containing phone strings of other speakers in the group. In the training phase, the first set of feature vectors are extracted from each password utterance and the phone boundaries for each phone in the password transcription are obtained using a speaker independent phone recognizer. A mixture model is developed for each phone of a given speaker's password. Then, using the feature vectors from the password utterances of all of the speakers in the group, transformation parameters and transformed models are generated for each phone and speaker, using mixture discriminant analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call