Real-time pre-processing for improved feature extraction of noisy speech

P P Raj

doi:10.1007/s10772-021-09835-x

Abstract

Several improvements of algorithms for the front-end feature extraction of real-time speech decoding in noisy ambiance have been proposed with their demonstration on the TIMIT speech corpus. Real-Time Voice Activity Detection (RT-VAD) is used to separate the voiced–unvoiced part of input from silence in the streaming speech input. Novel techniques for RT-Zero Crossing Detection and RT-Pitch Detection are presented as part of RT-VAD. Real-Time approximate Kalman filter is then applied to de-noise the incoming signal. All these are applied across a collection of frames of speech called context. Frame-based Linear Discriminant Analysis (LDA)-feature extraction is done by RT-Cepstral Mean and Variance Normalization (RT-CMVN) and RT-Splicing. The algorithms are tested on the TIMIT database for various noise levels. It is observed that we obtain a word-error rate (WER) improvement of 5% for 30 dB and 7% for 10 dB SNR, thus validating the proposed algorithms. Also, the comparison with other works shows a superior Speech Hit Rate (SHR) of 90.6% and Noise Hit Rate (NHR) of 86.2%.

Full Text