Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN

Longbiao Wang,Norihide Kitaoka,Seiichi Nakagawa

doi:10.1155/asp/2006/95491

Abstract

We propose robust distant speech recognition by combining multiple microphone-array processing with position-dependent cepstral mean normalization (CMN). In the recognition stage, the system estimates the speaker position and adopts compensation parameters estimated a priori corresponding to the estimated position. Then the system applies CMN to the speech (i.e., position-dependent CMN) and performs speech recognition for each channel. The features obtained from the multiple channels are integrated with the following two types of processings. The first method is to use the maximum vote or the maximum summation likelihood of recognition results from multiple channels to obtain the final result, which is called multiple-decoder processing. The second method is to calculate the output probability of each input at frame level, and a single decoder using these output probabilities is used to perform speech recognition. This is called single-decoder processing, resulting in lower computational cost. We combine the delay-and-sum beamforming with multiple-decoder processing or single-decoder processing, which is termed multiple microphone-array processing. We conducted the experiments of our proposed method using a limited vocabulary (100 words) distant isolated word recognition in a real environment. The proposed multiple microphone-array processing using multiple decoders with position-dependent CMN achieved a 3.2% improvement (50% relative error reduction rate) over the delay-and-sum beamforming with conventional CMN (i.e., the conventional method). The multiple microphone-array processing using a single decoder needs about one-third the computational time of that using multiple decoders without degrading speech recognition performance.

Highlights

Automatic speech recognition (ASR) systems are known to perform reasonably well when the speech signals are captured using a close-talking microphone
We proposed a robust distant speech recognition system based on position-dependent Cepstral mean normalization (CMN) using multiple microphones
The 3D space speaker position could be quickly estimated, and a channel distortion compensation method based on position-dependent CMN was adopted to compensate for the transmission characteristics

Summary

INTRODUCTION

Automatic speech recognition (ASR) systems are known to perform reasonably well when the speech signals are captured using a close-talking microphone. We propose a robust speech recognition method using a new real-time CMN based on speaker position, which we call position-dependent CMN. The system adopts the compensation parameter corresponding to the estimated position and applies a channel distortion compensation method to the speech (i.e., position-dependent CMN) and performs speech recognition. The maximum vote (i.e., voting method (VM)) or the maximum summation likelihood (i.e., maximum-summationlikelihood method (MSLM)) of all channels is used to obtain the final result [12], which is called multiple-decoder processing This should obtain robust performance in a distant environment. A multiple microphone-array processing using multiple decoders or single decoder is proposed, while Section 5 describes the experimental results of distant speech recognition in a real environment.

SPEAKER POSITION ESTIMATION

Conventional CMN and real-time CMN

Incorporate speaker position information into real-time CMN

Problem and solution

Multiple-decoder processing

Voting method

Maximum-summation-likelihood method

Single-decoder processing

Multiple microphone-array processing

Experimental setup

Recognition experiment for speech emitted by a loudspeaker

Recognition experiment of speech uttered by humans

Experimental results for multiple-microphone speech processing

Findings

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Aug 13, 2006
Citations: 42	License type: cc-by

R Discovery Prime

R Discovery Prime

Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
L Wang ... N Kitaoka
IEICE Transactions on Information and Systems | VOL. E91-D
L Wang, et. al.L Wang ... N Kitaoka
01 Mar 2008
IEICE Transactions on Information and Systems | VOL. E91-D

Robust distant speech recognition based on position dependent CMN using a novel multiple microphone processing technique
Longbiao Wang ... Norihide Kitaoka
-
Longbiao Wang, et. al.Longbiao Wang ... Norihide Kitaoka
04 Sep 2005
04 Sep 2005

An analysis-by-synthesis approach to vocal tract modeling for robust speech recognition
Ziad Al Bawab
-
Ziad Al BawabZiad Al Bawab
01 Jan 2012
01 Jan 2012

강인한 음성인식을 위한 극점 필터링 및 스케일 정규화를 이용한 켑스트럼 특징 정규화 방식
Bo Kyeong Choi ... Sung Min Ban
The Journal of the Acoustical Society of Korea | VOL. 34
Bo Kyeong Choi, et. al.Bo Kyeong Choi ... Sung Min Ban
31 Jul 2015
The Journal of the Acoustical Society of Korea | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing