Abstract

This paper describes a system for the automatic recognition of digits spoken in connected strings. The input is standard telephone-quality audio, bandlimited to 200–3200 Hz, spoken in a room with a background noise level of 60 dBA. The first step performed is a segmentation of the incoming string into syllables. Important features of the syllabification algorithm [D. Kahn, J. Acoust. Soc. Am. Suppl. 1 73, S88 (1983)] are (a) the analysis is based solely on the energy contour, with no spectral information, and (b) syllables judged to be unstressed are appended to the preceding stressed syllable; this is relevant only in the case of “seven,” and results in the syllabic analysis being equivalent to a segmentation into digits. The remaining analysis centers around an Itakura-type dynamic-programming match between each extracted segment and a set of stored digit templates. The templates are themselves derived from segments extracted from telephone-quality connected-digit strings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.