Abstract

In this paper, a method for the estimation of pitch from noise-corrupted speech observations based on extracting a pitch harmonic and the corresponding harmonic number is proposed. Starting from the harmonic representation of clean speech, a simple yet accurate harmonic sinusoidal autocorrelation (HSAC) model is first derived. By employing this HSAC model expressed in terms of the pitch harmonics of the clean speech, a new autocorrelation-domain least-squares fitting optimization technique is developed to extract a pitch harmonic from the noisy speech. Then, the harmonic number associated with the pitch harmonic is determined by maximizing an objective function formulated as an impulse-train weighted symmetric average magnitude sum function (SAMSF) of the noisy speech. The period of the impulse-train is governed by the estimated pitch harmonic and the maximization of the objective function is carried out through a time-domain matching of periodicity of the impulse-train with that of the SAMSF. An SAMSF-based pitch tracking scheme using dynamic programming is devised to obtain a smoothed pitch contour. In order to demonstrate the efficacy of the proposed method, simulations are conducted by considering naturally spoken speech signals in the presence of white or multi-talker babble noise at different signal-to-noise ratio (SNR) levels. A comprehensive evaluation of the pitch estimation results shows the superiority of the proposed method over some of the state-of-the-art methods under low levels of SNR.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call