Modeling the recognition of sine-wave sentences

Jon P Barker,Martin P Cooke

doi:10.1121/1.416995

Abstract

Listeners can recognize sine-wave replicas of utterances synthesized using three time-varying sinusoids [Remez, Science 212, 947–950 (1981)]. Amplitude comodulation of such sine-wave sentences (SWS) further improves intelligibility [Carrell and Opie, Percept. Psychophys. 52, 437–445 (1992)]. In this study an automatic speech recognition (ASR) task investigated two issues. First, is the increased intelligibility of comodulated SWS sentences due to greater resemblance to natural speech? Second, is it necessary to acquire new speech schema for SWS sentences, or can existing schema be accessed using a different strategy? An ASR system was trained and tested on SWS stimuli giving a word recognition rate of 85%, compared to 92% for a system trained and tested on natural speech. Comodulated SWS was recognized at 85%. It appears that the information content of SWS is little below that of speech and is unaffected by comodulation. Testing SWS sentences on models trained on natural speech resulted in low recognition (5%). A second experiment modified the recognition strategy using occluded speech recognition techniques [Green etal ., ICASSP 401–404 (1995)] and gave performance for SWS recognition based on natural utterance models of 46%. These results suggest that SWS recognition does not necessarily rely on acquiring new SWS schemas.

Full Text