Advanced acoustic modelling techniques in MP3 speech recognition

Michal Borsky,Petr Pollak,Petr Mizera

doi:10.1186/s13636-015-0064-7

Abstract

The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article evaluates the performance of a recognition system optimized for MP3 compressed speech with current state-of-the-art acoustic modelling techniques and one specific front-end compensation method. The article concentrates on acoustic model adaptation, discriminative training, and additional dithering as prominent means of compensating for the described distortion in the task of phoneme and large vocabulary continuous speech recognition (LVCSR). The experiments presented on the phoneme task show a dramatic increase of the recognition error for unvoiced speech units as a direct result of compression. The application of acoustic model adaptation has proved to yield the highest relative contribution while the gain of discriminative training diminished with decreasing bit-rate. The application of additional dithering yielded a consistent improvement only for the MFCC features, but the overall results were still worse than those for the PLP features.

Highlights

The aim of automatic speech recognition (ASR) research is to develop an intermediary system for the purpose of human speech transcription where the construction and block architecture is often customized
The results were evaluated by phone error rate (PER) and the phone error rate reduction (PERR) criteria: PER = S + D + I × 100 [ %], N
Its application was useful for baseline and Linear discriminant analysis (LDA) models, but the reduction for more advanced acoustic modelling techniques was only marginal and higher bit-rates were mainly unaffected by the method

Summary

Introduction

The aim of automatic speech recognition (ASR) research is to develop an intermediary system for the purpose of human speech transcription where the construction and block architecture is often customized. This article investigates the performance of current state-of-the-art acoustic modelling (AM) and feature extraction techniques in the task of phoneme and large vocabulary continuous speech recognition of MP3 compressed speech.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jul 28, 2015
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Advanced acoustic modelling techniques in MP3 speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task
P Mihajlik ... Z Tuske
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 18
P Mihajlik, et. al.P Mihajlik ... Z Tuske
01 Aug 2010
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 18

Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition
...
-
, et. al. ...
25 Aug 2008
25 Aug 2008

Acoustic and pronunciation model adaptation for context-independent and context-dependent pronunciation variability of non-native speech
Yoo Rhee Oh ... Mina Kim
-
Yoo Rhee Oh, et. al. Yoo Rhee Oh ... Mina Kim
01 Mar 2008
01 Mar 2008

Comparison of Grapheme and Phoneme Based Acoustic Modeling in LVCSR Task in Slovak
Michal Mirilovič ... Anton Čižmár
-
Michal Mirilovič, et. al.Michal Mirilovič ... Anton Čižmár
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Advanced acoustic modelling techniques in MP3 speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing