Evaluation of selected acoustic parameters for use in speaker identification

E T Doherty

doi:10.1121/1.2001834

Abstract

The effectiveness of certain acoustic and temporal properties of the speech signal long-term power spectra (LTS), speaking fundamental frequency (SFF), and speaking time (ST) in the determination of a speaker's identity from his voice alone were tested alone and in various combination. Further, the effect of distortions—limited passband, stress, or disguise—were evaluated. Various analytical procedures, Euclidean distance, cross-correlation or discriminant analysis, are used. Two groups, 50 college-age males who read “normally” and 25 males, aged 25–45, who read normally and while subjected to stress and while attempting voice disguise were selected. Acoustic/temporal analyses were performed on the speakers' utterances to extract the LTS, SFF, and ST vectors. Filtering was simulated for LTS. Results indicated that (1) the LTS vector is extremely effective for identifying speech produced normally, (2) SFF and ST were far less effective, (3) combining vectors usually improved correct identification levels, (4) under stress or attempting a disguise, no single vector or combination adequately differentiated talkers, and (5) a discriminant analysis is a more better method of determining identity than is cross correlations or Euclidean distance.

Full Text