Abstract

The measurement of the word error rate (WER) of a speech recognizer is valuable for the development of new algorithms but provides only the most limited information about the performance of the recognizer. We propose the use of a human reference standard to assess the performance of speech recognizers, so that the performance of a recognizer could be quoted as being equivalent to the performance of a human hearing speech which is subject to X dB of degradation. This approach should have the major advantage of being independent of the database and speakers used for testing. Furthermore, it would allow factors beyond the word error rate to be measured, such as the performance within an interactive speech system. In this paper, we report on preliminary work to explore the viability of this approach. This has consisted of recording a suitable database for experimentation, devising a method of degrading the speech in a controlled way and conducting two set of experiments on listeners to measure their responses to degraded speech to establish a reference. Results from these experiments raise several questions about the technique but encourage us to experiment with comparisons with automatic recognizers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call