Abstract
This paper reports on the development and advances in automatic speech recognition for the AT&T Speak4it® voice-search application. With Speak4it as real-life example, we show the effectiveness of acoustic model (AM) and language model (LM) estimation (adaptation and training) on relatively small amounts of application field-data. We then introduce algorithmic improvements concerning the use of sentence length in LM, of non-contextual features in AM decision-trees, and of the Teager energy in the acoustic front-end. The combination of these algorithms, integrated into the AT&T Watson recognizer, yields substantial accuracy improvements. LM and AM estimation on field-data samples increases the word accuracy from 66.4% to 77.1%, a relative word error reduction of 32%. The algorithmic improvements increase the accuracy to 79.7%, an additional 11.3% relative error reduction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.