Abstract
In this paper, we study a novel approach to spoken language recognition using an ensemble of binary classifiers. In this framework, we begin by representing a speech utterance with a high-dimensional feature vector such as the phonotactic characteristics or the polynomial expansion of cepstral features. A binary classifier can be built based on such feature vectors. We adopt a distributed output coding strategy in ensemble classifier design, where we decompose a multiclass language recognition problem into many binary classification tasks, each of which addresses a language recognition subtask by using a component classifier. Then, we combine the results of the component classifiers to form an output code as a hypothesized solution to the overall language recognition problem. In this way, we effectively project high-dimensional feature vectors into a tractable low-dimensional space, yet maintaining language discriminative characteristics of the spoken utterances. By fusing the output codes from both phonotactic features and cepstral features, we achieve equal-error-rates of 1.38% and 3.20% for 30-s trials on the 2003 and 2005 NIST language recognition evaluation databases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Audio, Speech and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.