Abstract

The modern era in speaker recognition started about 50 years ago at Bell Laboratories with the controversial invention of the voiceprint technique for speaker identification based on expert analysis of speech spectrograms. Early speaker recognition research concentrated on finding acoustic-phonetic features effective in discriminating speakers. The first truly automatic text dependent speaker verification systems were based on time contours or templates of speaker specific acoustic features. An important element of these systems was the ability to time warp sample templates with model templates in order to provide useful comparisons. Most modern text dependent speaker verification systems are based on statistical representations of acoustic features analyzed as a function of time over specified utterances, most particularly the hidden markov model (HMM) representation. Modern text independent systems are based on vector quantization representations and, more recently, on Gaussian mixture model (GMM) representations. An important ingredient of statistically based systems is likelihood ratio decision techniques making use of speaker background models. Some recent research has shown how to extract higher level features based on speaking behavior and combine it with lower level, acoustic features for improved performance. The talk will present these topics in historical order showing the evolution of techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call