Deep bottleneck features for spoken language identification.

Bing Jiang,Si Wei,Ian Vince Mcloughlin,Li-Rong Dai,Yan Song,Jun-Hua Liu,Donald A Robin

doi:10.1371/journal.pone.0100795

Bing Jiang, Si Wei + Show 5 more

Open Access

https://doi.org/10.1371/journal.pone.0100795

Copy DOI

Journal: PloS one	Publication Date: Jul 1, 2014
Citations: 54	License type: CC BY 4.0

Affiliation: University of Science and Technology of China

Abstract

A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.

Highlights

Language identification (LID) is the task of determining the identity of the spoken language present within a speech utterance
Proposed LID Systems Using Deep Bottleneck Features (DBF) we present two Total Variability (TV) based acoustic systems to evaluate the effectiveness of the DBF for spoken LID, termed DBF-TV and parallel DBF-TV (PDBF-TV)
The training utterances for each language came from two different channels, i.e. the dataset of Conversational Telephone Speech (CTS) and narrow band Voice of America (VOA) radio broadcasts

Summary

Introduction

Language identification (LID) is the task of determining the identity of the spoken language present within a speech utterance. A major problem in LID is how to design a language specific and effective representation for speech utterances. Over the past few decades, intensive research efforts have studied the effectiveness of different representations from various research domains, such as phonotactic and acoustic information [1,2,3], lexical knowledge [4], prosodic information [5], articulatory parameters [6], and universal attributes [7]. We mainly focus on the phonotactic and acoustic representations, which are considered to be the most common ones for LID [8,9]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep bottleneck features for spoken language identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Performance evaluation of deep bottleneck features for spoken language identification
Bing Jiang ... Ian Mcloughlin
-
Bing Jiang, et. al.Bing Jiang ... Ian Mcloughlin
01 Sep 2014
01 Sep 2014

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification
Ma Jin ... Lirong Dai
-
Ma Jin, et. al.Ma Jin ... Lirong Dai
21 Jun 2016
21 Jun 2016

Comparative Study on Spoken Language Identification Based on Deep Learning
Panikos Heracleous ... Yasser Mohammad
-
Panikos Heracleous, et. al.Panikos Heracleous ... Yasser Mohammad
01 Sep 2018
01 Sep 2018

Study on the Effect of Emotional Speech on Language Identification
Priyam Jain ... Anil Kumar Vuppala
-
Priyam Jain, et. al.Priyam Jain ... Anil Kumar Vuppala
01 Feb 2020
01 Feb 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep bottleneck features for spoken language identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one