Abstract

Abstract Recently, the first version of an ETSI standard for DistributedSpeech Recognition has been proposed. The main benefit ofthis approach is the possibility of maintaining a high recogni-tion performance when accessing remote information systems.The use of a digital channel for transmission of the encodedspeech parameters implies the introduction of several channeldistortions. Our paper deals with the mitigation of such dis-tortions. We study the application of MMSE estimation to thisproblem and propose a new MMSE procedure that obtains theprobabilities needed for MMSE from a forward-backward al-gorithm. We show that MMSE estimation obtains better per-formance than the mitigation algorithm described in the ETSIstandard under different channel conditions. 1. Introduction Very recently, the problem of recognizing speech transmittedover digital channels has been addressed and an ETSI standardhas been elaborated (ETSI-ES-201-108 [1]). The AURORAworking group was the responsible for developing this first s-tandard and a Distributed Speech Recognition (DSR) approach,that is, a local front-end and a remote back-end, was adopted.There are clear advantages in this approach: voice features arenot affected by the speech coder, more robustness against chan-nel errors, and access from different networks with a guaranteedperformance.An important issue being currently addressed is robustnessagainst adverse environments (in which the front-end of a D-SR system must operate). Also, robustness against transmissionchannel errors must be taken into account. This is not exclusive-ly a channel coding problem. During the last years, several er-ror mitigation (or concealment) techniques, that provide an im-proved decoding, have been studied for speech or image coding[2] [3]. These techniques usually exploit some kind of knowl-edge about the encoded parameters which is embedded in a softdecoding scheme. In the case of DSR, we find that the encodedparameters (MFCCs in the current version of the standard) dif-fer from those normally utilized in speech coding. Moreover,the goal of DSR is completely different from subjective visionor hearing, since at the back-end we find an automatic speechrecognition system. Therefore, the development of specific mit-igation algorithms for DSR is clearly justified. The ETSI DSRstandard already includes a basic mitigation algorithm that hasbeen shown quite effective for medium and good quality chan-nels on TETRA and GSM environments [4]. Error mitigationcan be also interesting not only for DSR, but also for other ap-plications such as speech reconstruction from the transmittedDSR speech features.In this paper, we address the problem of mitigating channelerrors, studying the performance of mitigation algorithms basedon an MMSE (Minimum Mean Square Error) philosophy. Inparticular, we propose a new MMSE mitigation algorithm thatutilizes correct frames received before and after the frame be-ing estimated. The different proposed techniques are develope-d using the AURORA ETSI standard front-end, although theycould be straightforwardly extended to other encoding schemes.The proposed mitigation algorithms affect only to the decodingstage of the ETSI standard. For the sake of simplicity, we willassume a BPSK modulation and test two different data chan-nels (AWGN and bursty). The recognition experiments are per-formed on the Aurora-2 speech database.The paper is organized as follows. First, we briefly summa-rize the ETSI DSR standard and its error mitigation algorithm.Sections 3 and 4 are devoted to the study of several mitigationtechniques over AWGN and bursty channels, respectively. Fi-nally, the conclusions of this work are summarized.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.