Enhanced speaker identification through the use of cepstral coefficients and redundant time frame elimination

Timothy R Thomas,Kalyan Ganesan,Stan Willie,George J Papcun

doi:10.1121/1.2025404

Abstract

A text-independent speaker identification system was developed on two male and two female speakers with similar accents. Each read a phonetically balanced list of ten sentences. The system was trained repeatedly on a rotated set of nine sentences and tested on the remaining one. Speech during successive 16-ms windowed time slices was described by 14 cepstral coefficients. Unit direction vectors were used to characterize each sentence from each speaker. A nearest-centroid, nearest-neighbor, or improved perceptron neural net training procedure was used to define decision regions. When the data were preprocessed so as to remove time slices that were similar in all speakers, discriminability was enhanced and error-less identification was obtained. The success of this system appears to result primarily from the ability of the cepstral coefficients to capture the speaker-dependent information in the higher formants and from the accentuation of this information by the preprocessor.

Full Text