Abstract
Automatic recognition of spoken languages has become an important feature in a variety of speech-enabled multilingual applications which, besides accuracy, also demand for efficient and linguistically scalable algorithms. This paper deals with a particularly successful approach based on phonotactic-acoustic features and presents systems for language identification as well as for unknown-language rejection. An architecture with multipath decoding, improved phonotactic models using binary-tree structures, and acoustic pronunciation models serve as a framework for experiments and discussion on these two tasks. In particular, language identification accuracy on a telephone-speech task (NIST'95 evaluation) in six and nine languages is presented together with results from a perceptual experiment carried out with human listeners. The performance of language rejection based on phonotactic modeling combined with a monolingual LVCSR system in the domain of broadcast news transcription is also reported. Besides yielding state-of-the-art performance, the described systems are computationally inexpensive and easily extensible (scalable) to new languages without the need for linguistic experts.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.