In recent years, linear prediction voice encoders have become very efficient in terms of computing execution time and channel bandwidth usage while providing, in the absence of impulsive noise, natural sounding synthetic speech signals. This good performance has been achieved via the use of a maximum likelihood parameter estimation of an auto-regressive model of order ten that best fits the speech signal under the assumption that the signal and the noise are Gaussian stochastic processes. However, this method breaks down in the presence of impulse noise, which is common in practice, resulting in harsh or non-intelligible audio signals. In this paper, we propose a robust estimator of correlation, the Phase-Phase correlator that is able to cope with impulsive noise. Utilizing this correlator, we develop a Robust Mixed Excitation Linear Prediction encoder that provides improved audio quality for voiced, unvoiced, and transition speech segments. This is achieved by applying a statistical test to robust Mahalanobis distances for identifying the outliers in the corrupted speech signal, which are then replaced with filtered signals. Simulation results reveal that the proposed estimator of correlator outperforms in variance, bias, and breakdown point compared to three other robust approaches based on the arcsin law, the polarity coincidence correlator, and the median-of-ratio estimator without sacrificing the encoder bandwidth efficiency and the compression gain while remaining compatible with real-time applications. Furthermore, in the presence of impulsive noise, the proposed speech encoder speech subjective quality outperforms the state-of-the-art in terms of mean opinion score.
Read full abstract