Abstract

The principal object in using a phonetic approach is the reduction of the influence on recognition rate caused by the intra- and inter-speaker speech variations. The system is implemented on a 16-K minicomputer and uses a filter bank delivering spectral sections, 0–5 kHz, every 10 ms. Estimates of the first three formants are calculated and energies in different spectral bands are used to segment the speech signal into broad classes. The following measures are calculated depending on the segmental class and the speech parameter: mean values, steady-state values, durations, transition rates, and some distances between formants. In a learning phase the statistics of the measures of the used vocabulary are automatically calculated by a program given the quasiphonetic spelling of the input words. The statistics are based on phoneme pairs, i.e., diphones. In the recognition phase the program uses the statistics and the quasiphonetic spelling to recognize the input words. Six male speakers were used for calculating the statistics of a 41-word vocabulary. Their mean recognition rate was 98%, using a new recording. The rate decreased to 96.3% using four male talkers, unknown to the system. [Work supported by STU, Sweden.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.