Word recognition by humans and machines: Tests on a multitalker, multistyle database

Patricia K Kuhl,Kerry P Green,Caroline Fu,John W Gordon,David L Sanford

doi:10.1121/1.2027648

Patricia K Kuhl, Kerry P Green + Show 3 more

Open Access

PDF Available

https://doi.org/10.1121/1.2027648

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Experiments comparing isolated word recognition by human listners with automatic speech recognition systems are valuable because error analyses may lead to improvements in speech recognition technology. Isolated word recognition in adult human listeners has been compared with recognition performance by two commercially available speech-recognition systems. The test stimuli were drawn from the Lincoln Laboratory Stressed-Speech database. The database consists of 6930 stimuli (two iterations of each of 35 words spoken by nine different people in 11 different speaking styles). The vocabulary contains confusable words (i.e., go, hello, oh, no, and zero); the speaking styles include a wide range of naturally occurring variations (i.e., normal, slow, fast, soft, loud, angry). Analyses show that the acoustic characteristics of individual words vary considerably across talkers, and across styles within talkers. Performance of human listeners and the two machine-based recognition systems was tested in a single-talker, multistyle condition, and in a multitalker, multistyle condition. All tests were conducted under two listening conditions: normal, and in the presence of masking noise. The data to be presented are the error patterns of human listeners, versus the machine-recognition systems, exhibited across talkers, across speaking styles, and across training conditions (multitalker, multistyle training versus single talker, single style training). [Work supported by Boeing Aerospace and Electronics.]

Full Text