Abstract
A test is proposed to characterize the performance of speech recognition systems. The QuickSIN test is used by audiologists to measure the ability of humans to recognize continuous speech in noise. This test yields the signal-to-noise ratio at which individuals can correctly recognize 50% of the keywords in low-context sentences. It is argued that a metric for automatic speech recognizers will ground the performance of automatic speech-in-noise recognizers to human abilities. Here, it is demonstrated that the performance of modern recognizers, built using millions of hours of unsupervised training data, is anywhere from normal to mildly impaired in noise compared to human participants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.