Abstract

Recognition of spoken names is a challenging task for automatic speech recognition systems because the list of names for applications such as directory assistance tends to be in the order of several hundred thousands. This makes spoken name recognition a very high perplexity task. In this paper we propose the use of syllables as the acoustic unit for spoken name recognition based on reverse lookup schemes and show how syllables can be used to improve recognition performance and reducing the system perplexity. We present system design methodologies to address the problem of acoustic-training data sparsity encountered when using longer length units such as syllables. We illustrate our ideas first on a TIMIT based continuous speech recognition problem and then focus on the application of these ideas to spoken name recognition. Our results on the OGI spoken name corpus indicate that using syllables in place of phoneme models can help boost system accuracy significantly while helping to reduce the system complexity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.