Abstract

A new generation of low-cost speech recognition devices are appearing, which offer much promise for useful applications in biomedical engineering. These devices are statistical pattern recognizers. Input utterances are classified by comparison with a set of templates derived during ‘speaker training’. For useful application of these devices, recognition accuracy must be high and speaker training must not be unacceptably complicated or tedious. This paper investigates techniques which consider the statistical nature of the input utterances, used to improve recognition accuracy. Word classification based on the Mahalonobis distance metric, and using templates derived from cluster analysis of the training inputs, was found to give results superior to the other strategies studied. This classifier was unsuitable for implementation in a real-time, low-cost system but the principle of clustering was successfully applied to produce an adaptive system which tracked changes in the user's voice. This allowed training to be drastically simplified by updating templates during normal operation. The adaptive system achieved 98.8% recognition accuracy on a 32 word vocabulary compared to 94.8% without adaptation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call