Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

Masayuki Suzuki,Ryuki Tachibana,Tohru Nagano,Gakuto Kurata

doi:10.1109/icassp.2016.7472766

Abstract

Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Strategies for improving audible quality and speech recognition accuracy of reverberant speech
B.W Gillespie ... A.E Atlas
-
B.W Gillespie, et. al.B.W Gillespie ... A.E Atlas
06 Apr 2003
06 Apr 2003

Multichannel Wiener Filter with Early Reflection Raking for Automatic Speech Recognition in Presence of Reverberation
Konrad Kowalczyk
-
Konrad KowalczykKonrad Kowalczyk
01 Sep 2019
01 Sep 2019

A Joint Approach for Single-Channel Speaker Identification and Speech Separation
Pejman Mowlaee ... Mads Græsbøll Christensen
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 20
Pejman Mowlaee, et. al.Pejman Mowlaee ... Mads Græsbøll Christensen
01 Nov 2012
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 20

Analyzing temporal transition of real user's behaviors in a spoken dialogue system
Kazunori Komatani ... Tatsuya Kawahara
-
Kazunori Komatani, et. al.Kazunori Komatani ... Tatsuya Kawahara
27 Aug 2007
27 Aug 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

Abstract

Talk to us

Similar Papers