Tuning to the speaker: dynamic adaptation of statistical parameters in isolated letter recognition

Richard M Stern,Moshe J Lasry

doi:10.1121/1.2019825

Abstract

The performance of a feature‐based, speaker‐independent recognition system can be improved by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker‐independent, it is typically observed that for a given feature and pair of letters within‐speaker variation can be less than between‐speaker variation. For example, across all speakers a given feature may have an expected value of 5 for the letter M and 10 for N, but a certain speaker may produce average values of 9 for M and 14 for N. In such cases it is necessary to adjust statistical feature parameters to the individual speaker to obtain optimal recognition performance. This paper describes a dynamic adaptation procedure for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of measure values on a speaker‐by‐speaker basis. In updating these mean vectors, the algorithm makes use of the observations input thus far, the relative variability of the features' means within and across subjects, and the covariance of the mean vectors within and across the various letters or sets of letters. The use of tuning produced a dramatic decrease in error rates for certain speakers and letters. [Work supported by NSF.]

Full Text