Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Wei-Wei Liu,Michael T Johnson,Jia Liu,Wei-Qiang Zhang

doi:10.1186/s13636-014-0042-5

Abstract

Abstract Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.

Highlights

Spoken language recognition (SLR) refers to the task of automatic determination of language identity
In the language recognition system employed in this paper, we focus on how a change in the input to the support vector machine (SVM) affects the output
Selecting SVM supervector reconstruction methods is an open question, so here we propose some typical methods to the implementation

Summary

Introduction

Spoken language recognition (SLR) refers to the task of automatic determination of language identity. 3.3 Perturbational SVM supervector reconstruction For spoken language recognition, the first and most essential step is to tokenize the running speech into sound units or lattices using a phone recognizer. Given a supervector φ(x) and some perturbation operator on φ(x), we are interested in understanding how a small perturbation added to the supervector affects the behavior of the SVM [33] This relationship can be represented using a mapping onto a perturbational vector space.

Homogeneous ensemble language recognition system

Experimental setup

Method

Functional SVM supervector reconstruction

Perturbational SVM supervector reconstruction

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Dec 1, 2014
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition
Wei-Wei Liu ... Jia Liu
Journal of Signal Processing Systems | VOL. 82
Wei-Wei Liu, et. al.Wei-Wei Liu ... Jia Liu
28 May 2015
Journal of Signal Processing Systems | VOL. 82

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

Discriminative boosting regression backend for phonotactic language recognition
Wei-Wei Liu ... Wei-Qiang Zhang
-
Wei-Wei Liu, et. al.Wei-Wei Liu ... Wei-Qiang Zhang
01 Sep 2014
01 Sep 2014

Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results
Shabnam Gholamdokht Firooz ... Yasser Shekofteh
International Journal of Speech Technology | VOL. 21
Shabnam Gholamdokht Firooz, et. al.Shabnam Gholamdokht Firooz ... Yasser Shekofteh
28 Jun 2018
International Journal of Speech Technology | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing