Benchmarking human performance for continuous speech recognition

N Deshmukh,A Ganapathiraju,J Picone,R.J Duncan

doi:10.1109/icslp.1996.607317

Abstract

It is a well-established fact that human performance exceeds that of computers by orders of magnitude on a wide range of speech recognition tasks. However, there is widespread belief that the gap between human and machine performance has narrowed considerably on restricted problems. Yet, there are few extensive comparisons of performance on tasks involving large vocabulary continuous speech recognition (LVCSR) and low signal-to-noise ratios (SNRs). Human evaluations on LVCSR tasks highlight a number of interesting issues. For example, familiarity with the domain plays a crucial role in human performance. The authors conducted several experiments that extensively characterize human performance on LVCSR tasks over two standard evaluation corpora-ARPA's CSR'94 Spoke 10 and CSR'95 Hub 3. They demonstrate that human performance is at least an order of magnitude better than the best machine performance, and that human performance is fairly robust to a number of factors that typically degrade machine performance: SNR, speaking rate and style, microphone and ambient noise. In fact, human performance remained remarkably consistent across evaluation paradigms, and to some extent was artificially limited by a listener's attention span.

Full Text