Abstract
Speech enhancement algorithms have been employed successfully in many areas such as VoIP, automatic speech recognition and speaker verification. Many approaches are presented in the literature. This thesis focuses on enhancing single channel speech degraded by white noise or colored noise. A Kalman filter algorithm combined with the masking properties of human auditory systems is proposed. The threshold computed from the masking properties is used as a constraint in the Kalman filter to theoretically derive a modified Kalman filter. The derivation gives a theoretical foundation for the feasibility of combining masking properties with a Kalman filter. Some heuristic methods are also proposed for an easier implementation. One algorithm proposes to use the frequency domain masking level as a hard threshold to reshape the Kalman filtered signal. Another algorithm is to use a post-filter concatenated with the Kalman filter, using a threshold where both time-domain and frequency domain masking properties are taken into account. The goal of the masking is to make the energy of the estimate state error smaller than the threshold. To further decrease the computational cost, a wavelet Kalman filter combined with masking thresholds is also introduced. In the above algorithms, the speech model is assumed to be linear. Nonlinear speech models are also considered in the thesis. To address the nonlinear model problem, dual Extended Kalman Filter (EKF) and dual Unscented Kalman Filter (UKF) algorithms are studied. In these cases, both time-domain and frequency domain masking properties are taken into account. The simulation results show that all the proposed methods combining Kalman filter and masking properties can produce promising results from the point of view of PESQ scores. The average PESQ score gains obtained by these proposed methods are from about 0.35 to 0.45. Some informal subjective tests also show that the performance of the proposed methods is promising. No voice activity detection is required in the proposed methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.