Abstract

In this paper, we present recent improvements incorporated into an earlier speech encoding system that combined spectral subtraction with voice activity detection (SS-VAD) and linear predictive coding (LPC). The presence of background noise in the speech data reduces the speech audibility, quality, and intelligibility. In the previously proposed system, musical and other types of noises had an adverse effect on encoding performance under low signal-to-noise ratio (SNR) conditions. To address this issue, we propose a technique by amalgamating the minimum mean square error-spectrum power estimator based on the zero-crossing (MMSE-SPZC) and LPC for the development of a noise robust encoding system. The performance evaluation is done for noisy and enhanced encoded speech data using perceptual evaluation of speech quality (PESQ), composite measures (CM), and the normalized covariance metric (NCM). The newly proposed encoding system has provided better PESQ, CM, and NCM values for various types of noises at different SNR levels compared to the earlier reported SS-VAD and LPC-based system. In addition to performance evaluation, we also present the complexity of the computation and memory requirements for both the proposed and existing algorithms. The source code of the algorithms used in this work are made publicly available.1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call