Dithering techniques in automatic recognition of speech corrupted by MP3 compression: Analysis, solutions and experiments

Michal Borsky,Petr Mizera,Petr Pollak,Jan Nouza

doi:10.1016/j.specom.2016.11.007

Abstract

A large portion of the audio files distributed over the Internet or those stored in personal and corporate media archives are in a compressed form. There exist several compression techniques and algorithms but it is the MPEG Layer-3 (known as MP3) that has achieved a really wide popularity in general audio coding, and in speech, too. However, the algorithm is lossy in nature and introduces distortion into spectral and temporal characteristics of a signal. In this paper we study its impact on automatic speech recognition (ASR). We show that with decreasing MP3 bitrates the major source of ASR performance degradation is deep spectral valleys (i.e. bins with almost zero energy) caused by the masking effect of the MP3 algorithm. We demonstrate that these unnatural gaps in spectrum can be effectively compensated by adding a certain amount of noise to the distorted signal. We provide theoretical background for this approach where we show that the added noise affects mainly the spectral valleys. They are filled by the noise while the spectral bins with speech remain almost unchanged. This helps to restore a more natural shape of log spectrum and cepstrum, and consequently has a positive impact on ASR performance. In our previous work, we have proposed two types of the signal dithering (noise addition) technique, one applied globally, the other in a more selective way. In this paper, we offer a more detailed insight into their performance. We provide results from many experiments where we test them in various scenarios, using a large vocabulary continuous speech recognition (LVCSR) system, acoustic models based on gaussian-mixture model (GMM) as well as on deep-neural network (DNN), and multiple speech databases in three languages (Czech, English and German). Our results prove that both the proposed techniques, and the selective dithering method, in particular, yield consistent compensation of the negative impact of the MP3 compressed speech on ASR performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dithering techniques in automatic recognition of speech corrupted by MP3 compression: Analysis, solutions and experiments

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Nov 24, 2016
Citations: 2

Similar Papers

Novel speech processing techniques for robust automatic speech recognition

-

01 Jan 2006
01 Jan 2006

Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
Ahmed Hussen Abdelaziz
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Ahmed Hussen AbdelazizAhmed Hussen Abdelaziz
01 Mar 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

Discrete-Mixture HMMs-based Approach for Noisy Speech Recognition
Tetsuo Kosaka ... Masaharu Katoh
-
Tetsuo Kosaka, et. al.Tetsuo Kosaka ... Masaharu Katoh
01 Jun 2007
01 Jun 2007

A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition
Xiangang Li ... Xihong Wu
-
Xiangang Li, et. al.Xiangang Li ... Xihong Wu
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dithering techniques in automatic recognition of speech corrupted by MP3 compression: Analysis, solutions and experiments

Abstract

Talk to us

Similar Papers

More From: Speech Communication