Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition

Aldonso Becerra,N Iracemi Escalante,J Ismael De La Rosa,A David Pedroza,Efrén González

doi:10.1007/s11042-018-5917-5

Abstract

The aim of this paper is to exhibit two new variations of the frame-level cost function for training a deep neural network in order to achieve better word error rates in speech recognition. Optimization methods and their minimization functions are underlying aspects to consider when someone is working on neural nets, and hence their improvement is one of the salient objectives of researchers, and this paper deals in part with such a situation. The first proposed framework is based on the concept of extropy, the complementary dual function of an uncertainty measure. The conventional cross-entropy function can be mapped to a non-uniform loss function based on its corresponding extropy, enhancing the frames that have ambiguity in their belonging to specific senones. The second proposal makes a fusion of the presented mapped cross-entropy function and the idea of boosted cross-entropy, which emphasizes those frames with low target posterior probability. The proposed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of $12.3\%$ and $10.7\%$ is obtained with the two proposed approaches, respectively, with regard to the conventional well-established cross-entropy objective function.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Journal: Multimedia Tools and Applications	Publication Date: Mar 27, 2018
Citations: 19

Similar Papers

Speech recognition using deep neural networks trained with non-uniform frame-level cost functions
Aldonso Becerra ... N Iracemi Escalante
-
Aldonso Becerra, et. al.Aldonso Becerra ... N Iracemi Escalante
01 Nov 2017
01 Nov 2017

A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish
Aldonso Becerra ... A David Pedroza
Multimedia Tools and Applications | VOL. 79
Aldonso Becerra, et. al.Aldonso Becerra ... A David Pedroza
27 Mar 2020
Multimedia Tools and Applications | VOL. 79

Multi-Turn RNN-T for Streaming Recognition of Multi-Party Speech
Ilya Sklyar ... Xianrui Zheng
-
Ilya Sklyar, et. al.Ilya Sklyar ... Xianrui Zheng
23 May 2022
23 May 2022

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling
Naoyuki Kanda ... Mitsuyoshi Tachimori
-
Naoyuki Kanda, et. al.Naoyuki Kanda ... Mitsuyoshi Tachimori
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications