Monaural multi-talker speech recognition using factorial speech processing models

Mahdi Khademian,Mohammad Mehdi Homayounpour

doi:10.1016/j.specom.2018.01.007

Abstract

A Pascal challenge entitled monaural speech separation and recognition challenge was developed, targeting the problem of robust automatic speech recognition against speech-like noises which significantly degrade the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly, a team from IBM research could achieve performance better than human listeners on this task during the challenge. The IBM system consists of an intermediate speech separation and two single-talker speech recognition modules. This paper reconsiders the recognition task of this challenge based on gain adapted factorial speech processing models. It develops a joint-token passing algorithm for direct joint-decoding of target and masker speakers’ mixed-signals, simultaneously. It uses maximum uncertainty during the joint-decoding, which cannot be used in the two-phased IBM system. This paper provides a detailed derivation of inference on these models based on the general inference procedures of probabilistic graphical models. Additionally, it uses deep neural networks for joint-speaker identification and their gain estimation, which makes these two steps easier than before while producing competitive results for these steps. The proposed method of this work outperforms past super-human results and even the results recently achieved using deep neural networks by Microsoft research. It achieved 5.3% absolute task performance improvement compared to the first super-human system and 2.5% absolute task performance improvement compared to its recent competitor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Monaural multi-talker speech recognition using factorial speech processing models

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Feb 2, 2018
Citations: 15

Similar Papers

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
Masoud Geravanchizadeh ... Meysam Bashirpour
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021
Masoud Geravanchizadeh, et. al.Masoud Geravanchizadeh ... Meysam Bashirpour
04 Aug 2021
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks
Jun Du ... Yanhui Tu
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24
Jun Du, et. al.Jun Du ... Yanhui Tu
01 Aug 2016
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24

Feature Level Solution to Noise Robust Speech Recognition in the context of Tonal Languages
Utpal Bhattacharjee ... Jyoti Mannala
International Journal of Engineering and Advanced Technology | VOL. 9
Utpal Bhattacharjee, et. al.Utpal Bhattacharjee ... Jyoti Mannala
30 Dec 2020
International Journal of Engineering and Advanced Technology | VOL. 9

Automatic and human speech recognition in null grammar
Amit Juneja
The Journal of the Acoustical Society of America | VOL. 130
Amit JunejaAmit Juneja
01 Oct 2011
The Journal of the Acoustical Society of America | VOL. 130

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Monaural multi-talker speech recognition using factorial speech processing models

Abstract

Talk to us

Similar Papers

More From: Speech Communication