An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

Alicia Lozano-Diez,Doroteo T Toledano,Joaquin Gonzalez-Rodriguez,Ruben Zazo

doi:10.1371/journal.pone.0182580

Alicia Lozano-Diez, Doroteo T Toledano + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0182580

Copy DOI

Journal: PLOS ONE	Publication Date: Aug 10, 2017
Citations: 48	License type: CC BY 4.0

Affiliation: Autonomous University of Madrid

Abstract

Language recognition systems based on bottleneck features have recently become the state-of-the-art in this research field, showing its success in the last Language Recognition Evaluation (LRE 2015) organized by NIST (U.S. National Institute of Standards and Technology). This type of system is based on a deep neural network (DNN) trained to discriminate between phonetic units, i.e. trained for the task of automatic speech recognition (ASR). This DNN aims to compress information in one of its layers, known as bottleneck (BN) layer, which is used to obtain a new frame representation of the audio signal. This representation has been proven to be useful for the task of language identification (LID). Thus, bottleneck features are used as input to the language recognition system, instead of a classical parameterization of the signal based on cepstral feature vectors such as MFCCs (Mel Frequency Cepstral Coefficients). Despite the success of this approach in language recognition, there is a lack of studies analyzing in a systematic way how the topology of the DNN influences the performance of bottleneck feature-based language recognition systems. In this work, we try to fill-in this gap, analyzing language recognition results with different topologies for the DNN used to extract the bottleneck features, comparing them and against a reference system based on a more classical cepstral representation of the input signal with a total variability model. This way, we obtain useful knowledge about how the DNN configuration influences bottleneck feature-based language recognition systems performance.

Highlights

The task of Language Recognition or Language Identification (LID) is defined as the task of identifying the language spoken in a given audio segment [1]
We evaluate the language recognition system in the test-development dataset described in Section Test Datasets, where we explore the influence of variations in the topology of the deep neural network (DNN), and, we show the results in the evaluation dataset of the Language Recognition Evaluation (LRE) 2015
It is very interesting to see that the system which gives the best performance in terms of phoneme frame accuracy does not lead to a better bottleneck feature extractor for LID

Summary

Introduction

The task of Language Recognition or Language Identification (LID) is defined as the task of identifying the language spoken in a given audio segment [1]. Automatic systems for LID aim to perform this task automatically, learning from a given dataset the necessary parameters to identify new spoken data. There are multiple applications of this technology as, for example, call centers that need to classify a call according to the language spoken, speech processing systems that deal with multilingual inputs, multimedia content indexing, or security applications such as tracking people depending on their language or accent. An analysis of DNN topology in bottleneck based language recognition partir de la Voz (TEC2015-68172-C2-1-P). Both projects are funded by Ministerio de Economıa y Competitividad, Spain. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS
Kong Aik Lee ...
-
Kong Aik Lee, et. al.Kong Aik Lee ...
08 Sep 2016
The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS
Kong Aik Lee ...

Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction
Wei-Wei Liu ... Wei-Qiang Zhang
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014
Wei-Wei Liu, et. al.Wei-Wei Liu ... Wei-Qiang Zhang
01 Dec 2014
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014

I‐vector representation based on bottleneck features for language identification
Yan Song ... Bing Jiang
Electronics Letters | VOL. 49
Yan Song, et. al.Yan Song ... Bing Jiang
01 Nov 2013
Electronics Letters | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE