Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Pranay Dighe,Afsaneh Asaei,Hervé Bourlard

doi:10.1016/j.specom.2019.03.004

Abstract

Towards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to have a very low rank. In this paper, we exploit Principal Component Analysis and Compressive Sensing based dictionary learning for low-rank and sparse modeling of senone subspaces respectively. Our hypothesis is that the principal components of DNN posterior space (termed as eigen-posteriors in this work) and Compressive Sensing dictionaries can act as suitable models to extract the well-structured low-dimensional latent information and discard the undesirable high-dimensional unstructured noise present in the posteriors. Enhanced DNN posteriors thus obtained are used as soft targets for training better acoustic models to improve ASR. In this context, our approach also enables improving distant speech recognition by mapping far-field acoustic features to low-dimensional senone subspaces learned from near-field features. Experiments are performed on AMI Meeting corpus in both close-talk (IHM) and far-field (SDM) microphone settings where acoustic models trained using enhanced DNN posteriors outperform the conventional hard target based hybrid DNN-HMM systems. An information theoretic analysis is also presented to show how low-rank and sparse enhancement modify the DNN posterior space to better match the assumptions of hidden Markov model (HMM) backend.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Mar 26, 2019
Citations: 2

Similar Papers

On quantifying the quality of acoustic models in hybrid DNN-HMM ASR
Pranay Dighe ... Hervé Bourlard
Speech Communication | VOL. 119
Pranay Dighe, et. al.Pranay Dighe ... Hervé Bourlard
10 Mar 2020
Speech Communication | VOL. 119

An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition
Richard Rose ... Shou-Chun Yin
-
Richard Rose, et. al.Richard Rose ... Shou-Chun Yin
01 May 2011
01 May 2011

ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Gaofeng Cheng ... Haoran Miao
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Gaofeng Cheng, et. al.Gaofeng Cheng ... Haoran Miao
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Ensemble acoustic modeling in automatic speech recognition
Xin Chen
-
Xin ChenXin Chen
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Abstract

Talk to us

Similar Papers

More From: Speech Communication