Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks.

Ruben Zazo,Doroteo T Toledano,Alicia Lozano-Diez,Javier Gonzalez-Dominguez,Joaquin Gonzalez-Rodriguez,Ian Mcloughlin

doi:10.1371/journal.pone.0146917

Ruben Zazo, Doroteo T Toledano + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0146917

Copy DOI

Journal: PLOS ONE	Publication Date: Jan 29, 2016
Citations: 121	License type: CC BY 4.0

Affiliation: Autonomous University of Madrid

Abstract

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.

Highlights

Language identification (LID) aims to automatically determine which language is being spoken in a given segment of a speech utterance [1]
In order to have a better insight into the behavior of the Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) system when dealing with out of set test segments, we show in Fig 4 the confusion matrix of the best out of set system, oos_lstm_2_layer_512_units, when fed with real out of set test utterances
We present an analysis of the use of Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) for Automatic Language Identification (LID) of short utterances

Summary

Introduction

Language identification (LID) aims to automatically determine which language is being spoken in a given segment of a speech utterance [1]. In a globalized world where the use of voice-operated systems is more common every day, LID typically acts as a pre-processing stage for both human listeners (i.e. call routing to a proper human operator) and machine systems (i.e. multilingual speech processing systems) [2]. Driven by recent developments in speaker verification, the basic approach of these systems involves using i-vector front-end features followed by a classification stage that compensates speaker and session variabilities [5,6,7]. An i-vector is a fixed-size representation (typically from 400 to 600 dimensions) of a whole utterance, derived as a point estimate of the latent variables

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks
Zhanibek Kozhirbayev ... Muslima Karabalayeva
-
Zhanibek Kozhirbayev, et. al.Zhanibek Kozhirbayev ... Muslima Karabalayeva
01 Sep 2017
01 Sep 2017

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
21 Nov 2018
21 Nov 2018

Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting
Bibhuti Bhusan Sahoo ... Deepak Kumar
Acta Geophysica | VOL. 67
Bibhuti Bhusan Sahoo, et. al.Bibhuti Bhusan Sahoo ... Deepak Kumar
20 Jul 2019
Acta Geophysica | VOL. 67

Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Haşim Sak ... Françoise Beaufays
-
Haşim Sak, et. al.Haşim Sak ... Françoise Beaufays
14 Sep 2014
14 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE