MMSE-based channel error mitigation for distributed speech recognition

Antonio M Peinado,José L Perez-Cordoba,Victoria Sanchez,José C Segura

doi:10.21437/eurospeech.2001-633

Abstract

Abstract Recently, the ﬁrst version of an ETSI standard for DistributedSpeech Recognition has been proposed. The main beneﬁt ofthis approach is the possibility of maintaining a high recogni-tion performance when accessing remote information systems.The use of a digital channel for transmission of the encodedspeech parameters implies the introduction of several channeldistortions. Our paper deals with the mitigation of such dis-tortions. We study the application of MMSE estimation to thisproblem and propose a new MMSE procedure that obtains theprobabilities needed for MMSE from a forward-backward al-gorithm. We show that MMSE estimation obtains better per-formance than the mitigation algorithm described in the ETSIstandard under different channel conditions. 1. Introduction Very recently, the problem of recognizing speech transmittedover digital channels has been addressed and an ETSI standardhas been elaborated (ETSI-ES-201-108 [1]). The AURORAworking group was the responsible for developing this ﬁrst s-tandard and a Distributed Speech Recognition (DSR) approach,that is, a local front-end and a remote back-end, was adopted.There are clear advantages in this approach: voice features arenot affected by the speech coder, more robustness against chan-nel errors, and access from different networks with a guaranteedperformance.An important issue being currently addressed is robustnessagainst adverse environments (in which the front-end of a D-SR system must operate). Also, robustness against transmissionchannel errors must be taken into account. This is not exclusive-ly a channel coding problem. During the last years, several er-ror mitigation (or concealment) techniques, that provide an im-proved decoding, have been studied for speech or image coding[2] [3]. These techniques usually exploit some kind of knowl-edge about the encoded parameters which is embedded in a softdecoding scheme. In the case of DSR, we ﬁnd that the encodedparameters (MFCCs in the current version of the standard) dif-fer from those normally utilized in speech coding. Moreover,the goal of DSR is completely different from subjective visionor hearing, since at the back-end we ﬁnd an automatic speechrecognition system. Therefore, the development of speciﬁc mit-igation algorithms for DSR is clearly justiﬁed. The ETSI DSRstandard already includes a basic mitigation algorithm that hasbeen shown quite effective for medium and good quality chan-nels on TETRA and GSM environments [4]. Error mitigationcan be also interesting not only for DSR, but also for other ap-plications such as speech reconstruction from the transmittedDSR speech features.In this paper, we address the problem of mitigating channelerrors, studying the performance of mitigation algorithms basedon an MMSE (Minimum Mean Square Error) philosophy. Inparticular, we propose a new MMSE mitigation algorithm thatutilizes correct frames received before and after the frame be-ing estimated. The different proposed techniques are develope-d using the AURORA ETSI standard front-end, although theycould be straightforwardly extended to other encoding schemes.The proposed mitigation algorithms affect only to the decodingstage of the ETSI standard. For the sake of simplicity, we willassume a BPSK modulation and test two different data chan-nels (AWGN and bursty). The recognition experiments are per-formed on the Aurora-2 speech database.The paper is organized as follows. First, we brieﬂy summa-rize the ETSI DSR standard and its error mitigation algorithm.Sections 3 and 4 are devoted to the study of several mitigationtechniques over AWGN and bursty channels, respectively. Fi-nally, the conclusions of this work are summarized.

Full Text