Abstract

The effectiveness of certain acoustic and temporal properties of the speech signal long-term power spectra (LTS), speaking fundamental frequency (SFF), and speaking time (ST) in the determination of a speaker's identity from his voice alone were tested alone and in various combination. Further, the effect of distortions—limited passband, stress, or disguise—were evaluated. Various analytical procedures, Euclidean distance, cross-correlation or discriminant analysis, are used. Two groups, 50 college-age males who read “normally” and 25 males, aged 25–45, who read normally and while subjected to stress and while attempting voice disguise were selected. Acoustic/temporal analyses were performed on the speakers' utterances to extract the LTS, SFF, and ST vectors. Filtering was simulated for LTS. Results indicated that (1) the LTS vector is extremely effective for identifying speech produced normally, (2) SFF and ST were far less effective, (3) combining vectors usually improved correct identification levels, (4) under stress or attempting a disguise, no single vector or combination adequately differentiated talkers, and (5) a discriminant analysis is a more better method of determining identity than is cross correlations or Euclidean distance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.