Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Sara Ahmadi,Seyed Mohammad Ahadi,Bert Cranen,Lou Boves

doi:10.1186/s13636-014-0036-3

Abstract

AbstractThe full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.

Highlights

Nobody will seriously disagree with the statement that most of the information in acoustic signals is encoded in the way in which the signal properties change over time and that instantaneous characteristics, such as the shape or the envelope of the short-time spectrum, are less important - though surely not unimportant
In experimenting with the AURORA-2 task, it is a pervasive finding that the results depend strongly on the word insertion penalty (WIP) that is used in the Viterbi back end
In this paper we set aside a small development set, on which we searched the WIP that gave the best results in the conditions with signal-to-noise ratio (SNR) ≤ 5 dB; in these conditions the best performance was obtained with the same WIP value

Summary

Introduction

Nobody will seriously disagree with the statement that most of the information in acoustic signals is encoded in the way in which the signal properties change over time and that instantaneous characteristics, such as the shape or the envelope of the short-time spectrum, are less important - though surely not unimportant. The dynamic changes over time in the envelope of the short-time spectrum are captured in the modulation spectrum [1,2,3]. This makes the modulation spectrum a fundamentally more informative representation of audio signals than a sequence of short-time spectra. In this paper we are concerned with the use of modulation spectra for automatic speech recognition (ASR), noise-robust speech recognition. In this application domain, we cannot rely on the intervention of the human auditory system. It is necessary to automatically extract the information encoded in the modulation spectrum that humans would use to understand the message

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Oct 21, 2014
Citations: 26	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition
Aye Nyein Mon ... Ye Kyaw Thu
-
Aye Nyein Mon, et. al.Aye Nyein Mon ... Ye Kyaw Thu
01 Aug 2019
01 Aug 2019

A CAUSAL DEEP LEARNING FRAMEWORK FOR CLASSIFYING PHONEMES IN COCHLEAR IMPLANTS.
Kevin Chu ... Leslie Collins
Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) | VOL. 2021
Kevin Chu, et. al.Kevin Chu ... Leslie Collins
06 Jun 2021
Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) | VOL. 2021

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems
Manaal Faruqui ... Dilek Hakkani-Tür
Computational Linguistics | VOL. 48
Manaal Faruqui, et. al.Manaal Faruqui ... Dilek Hakkani-Tür
04 Apr 2022
Computational Linguistics | VOL. 48

The development of isolated words corpus of Pashto for the automatic speech recognition research
Irfan Ahmed ... Gulzar Ahmad
-
Irfan Ahmed, et. al.Irfan Ahmed ... Gulzar Ahmad
01 Oct 2012
01 Oct 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing