Abstract

It is a well-established fact that human performance exceeds that of computers by orders of magnitude on a wide range of speech recognition tasks. However, there is widespread belief that the gap between human and machine performance has narrowed considerably on restricted problems. Yet, there are few extensive comparisons of performance on tasks involving large vocabulary continuous speech recognition (LVCSR) and low signal-to-noise ratios (SNRs). Human evaluations on LVCSR tasks highlight a number of interesting issues. For example, familiarity with the domain plays a crucial role in human performance. The authors conducted several experiments that extensively characterize human performance on LVCSR tasks over two standard evaluation corpora-ARPA's CSR'94 Spoke 10 and CSR'95 Hub 3. They demonstrate that human performance is at least an order of magnitude better than the best machine performance, and that human performance is fairly robust to a number of factors that typically degrade machine performance: SNR, speaking rate and style, microphone and ambient noise. In fact, human performance remained remarkably consistent across evaluation paradigms, and to some extent was artificially limited by a listener's attention span.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.