Abstract

Accurate and robust connected digit recognition is essential for a wide range of telecommunication services. Based on training and testing using only clean network digit data, and using the same whole‐word model architecture as in the TI/NIST connected digit testing, the string error rate increased from less than 1% to more than 5%. The performance degraded even further when evaluated on data collected with different network conditions. Most of the observed errors were caused by changing channel characteristics, highly variable digit pronunciations, and inadequate modeling of cross‐digit coarticulation. Results are presented for a number of context‐dependent whole‐word and subword modeling techniques developed to overcome some of the above problems. The most effective one is a new acoustic subword modeling approach that assumes that each digit model consists of three parts, namely, head, body, and tail subword units. Multiple heads and tails are also allowed, one for each of the 11 possible preceding and following digits and the background. Cross‐digit coarticulation is modeled by connecting the pair of digits through the corresponding tail and head units. Testing on about 12 000 digit strings, collected from five regions, this new model architecture reduced the string error rate to under 2%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.