Abstract

Abstract Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.

Highlights

  • Spoken language recognition (SLR) refers to the task of automatic determination of language identity

  • In the language recognition system employed in this paper, we focus on how a change in the input to the support vector machine (SVM) affects the output

  • Selecting SVM supervector reconstruction methods is an open question, so here we propose some typical methods to the implementation

Read more

Summary

Introduction

Spoken language recognition (SLR) refers to the task of automatic determination of language identity. 3.3 Perturbational SVM supervector reconstruction For spoken language recognition, the first and most essential step is to tokenize the running speech into sound units or lattices using a phone recognizer. Given a supervector φ(x) and some perturbation operator on φ(x), we are interested in understanding how a small perturbation added to the supervector affects the behavior of the SVM [33] This relationship can be represented using a mapping onto a perturbational vector space.

Homogeneous ensemble language recognition system
Experimental setup
Method
Functional SVM supervector reconstruction
Perturbational SVM supervector reconstruction
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.