Abstract

The HEAR acoustic processor combines standard frequency-domain and cycle-synchronous time-domain parameters. Output segments, usually 10 msec. in length, vary dynamically from .1 msec. to over 100 msec. to capture significant events in the underlying acoustic phone structure. Segment labels are determined by matching against a set of about 200 automatically selected prototypes. Some statistics on the fraction of segments correctly labeled (from a choice of 52 labels) and their most likely confusions are included. Speech recognition results obtained using the HEAR acoustic processor in conjunction with the training and decoding procedures of the IBM Research Continuous Speech Recognition mainline system are presented. On a set of 125 test sentences (1010 words) of the New Raleigh Language (artificial language, 250 word vocabulary, perplexity 7.27), the sentence recognition rate is 100%. On a set of 10 test sentences (282 words) of the Laser-1000 Language (natural language, 1000 word vocabulary, perplexity 21.1), the word recognition rate is 80%. Although it generally is difficult to ascribe errors to specific system components, three classes of errors are observed: 1) the correct word is not hypothesized; therefore acoustic match is not performed, 10.3% words, 2) the correct word is hypothesized but search is pruned prior to the construction of longer phrases including it, 6.4%, 3) the correct word is hypothesized, fully matched, and rejected in favor of an incorrect word, 3.2%. Errors of the third class are comprised exclusively of short function words (e.g. the, of, etc.), 2.2%, and deleted commas (realized acoustically by optional interword pauses), 1.0%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.