Abstract

The authors have developed a method of speaker identification based on long-term statistical measures of speech. That is, n-dimensional Euclidian distances between long-term speech spectra are calculated and used in the identification of speakers. In this experiment, the technique was evaluated in order to discover if it was resistant to the effects of variations in the mode of speech production and signal transmission. Three different speaking conditions were studied: (a) normal speech. (b) speech under stress-talkers were subjected to randomly distributed electric shocks while speaking, and (c) disguised speech-talkers were permitted to disguise their speech in any manner they chose except by whispering or the use of a foreign dialect. Moreover, the entire procedure was replicated for a restricted passband of 300–3500 Hz; this band is similar to that found in telephone transmissions. Speech samples were obtained from 25 adult American males who read Stevenson's “Apology for Idlers” under the three different experimental conditions. A portion of the tape-recorded reading was analyzed in 1/3-octave bands by means of a GR-1925 Multifilter and a GR-1926 Multichannel rms Detector; four speech samples of 32-sec duration were analyzed for each subject and experimental condition. Further processing of the spectral results (expressed in decibel levels for each frequency band) was carried out on an IBM 370/175 computer. The normalized data were used to obtain Euclidian distances; in turn, they were utilized to evaluate both intra- and interspeaker variations in the speech spectra for the different speaking and (parallel) passband conditions. The normalized mean values of the four subsamples produced by each speaker under all conditions constituted the set of reference samples, and two different sets of test samples were studied. The reference data were used to discriminate among the speakers in the normal speaking mode; they were then used in an attempt to identify the speakers in the stress and disguise conditions. The entire process was replicated for the passband condition. The results of the research relative to the normal mode procedure demonstrated a relatively high level of correct speaker identification (slightly over 90%); the correct identification level was reduced by nearly 20% for the passband mede. The identification levels for the stress condition were nearly as high as they were for the normal mode but the correct speaker identification level for disguise was little better than chance. Replication of the procedures for the passband condition resulted in a slight further degradation of the stress condition but a marked improvement with respect to disguise. It appears that this method can identify individuals from their speech reasonably well when they are speaking normally or under stress; it cannot do so, however, when they attempt to disguise their voices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.