Abstract

The accuracy of automatic speech recognition (ASR) systems is generally evaluated using corpora of grammatically sound read speech or natural spontaneous speech. This prohibits an accurate estimation of the performance of the acoustic modeling part of ASR, since the language modeling performance is inherently integrated in the overall performance metric. Even though acoustic modeling accuracy for ASR can be evaluated on these corpora using a null grammar language model, the accuracy cannot be compared with human speech recognition (HSR) since human listeners cannot be asked to ignore grammar. In this work a null grammar speech corpus was collected for comparing HSR and ASR. The corpus was collected in a hemi-anechoic chamber using three different vocabulary sizes—1000, 5000, and 10000—in a quiet environment. Noisy speech files at different signal-to-noise ratios were generated by adding noise at different levels to the quiet speech recordings. Human listeners were employed to transcribe the recordings and their accuracy was compared with an ASR system under different vocabularies and noise levels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call