In automatic speech recognition (ASR) of speech corrupted by noise, the performance tends to deteriorate rapidly depending on the choice of analysis method and distance measure. In order to evaluate the recognition performance for several analysis methods and distance measures, a series of isolated word recognition experiments was performed. Analysis methods selected are critical-band filtering, perceptually based linear prediction (PLP), linear prediction (LP), and time synchronous linear prediction (SLP). The weighted Euclidean distance with different weightings [unity, root power sums (RPS), and exponential filtering] was applied in the cepstrum domain. Experiments were carried out for clean speech and for two noise conditions (white and low-pass filtered white, added to the clean speech) at different SNR ratios (25 to 5 dB), using an alphanumeric vocabulary (ten speakers). It is shown that improvements in robustness of the recognizer in noise can be achieved by a proper selection of analysis method and cepstral weights used in the front-end. Improvements are found over the RPS distance measure (previously shown to be useful in noise conditions with LP and PLP analyses) [B. Hanson and H. Wakita, Proceedings ICASSP 86 (IEEE, New York, 1986), pp. 757–760] by use of the general exponential lifter.
Read full abstract