Abstract

The performance of a feature‐based, speaker‐independent recognition system can be improved by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker‐independent, it is typically observed that for a given feature and pair of letters within‐speaker variation can be less than between‐speaker variation. For example, across all speakers a given feature may have an expected value of 5 for the letter M and 10 for N, but a certain speaker may produce average values of 9 for M and 14 for N. In such cases it is necessary to adjust statistical feature parameters to the individual speaker to obtain optimal recognition performance. This paper describes a dynamic adaptation procedure for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of measure values on a speaker‐by‐speaker basis. In updating these mean vectors, the algorithm makes use of the observations input thus far, the relative variability of the features' means within and across subjects, and the covariance of the mean vectors within and across the various letters or sets of letters. The use of tuning produced a dramatic decrease in error rates for certain speakers and letters. [Work supported by NSF.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call