Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech recognition, speaker recognition, and speech coding. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. This study presents an alternative viewpoint by approaching the same problem from the noise perspective. Here, a framework is developed to analyze and use the noise information available for improving performance of speech systems. The proposed framework focuses on explicitly modeling the noise and its impact on speech system performance in the context of speech enhancement. The framework is then employed for development of a novel noise tracking algorithm for achieving better speech enhancement under highly evolving noise types. The first part of this study employs a noise update rate in conjunction with a target enhancement algorithm to evaluate the need for tracking in many enhancement algorithms. It is shown that noise tracking is more beneficial in some environments than others. This is evaluated using the Log-MMSE enhancement scheme for a corpus of four noise types consisting of Babble (BAB), White Gaussian (WGN), Aircraft Cockpit (ACN), and Highway Car (CAR) using the Itakura-Saito (IS) (Gray et al. in IEEE Trans. Acoust. Speech Signal Process. 28:367---376, 1980) quality measure. A test set of 200 speech utterances from the TIMIT corpus are used for evaluations. The new Environmentally Aware Noise Tracking (EA-NT) method is shown to be superior in comparison with the contemporary noise tracking algorithms. Evaluations are performed for speech degraded using a corpus of four noise types consisting of: Babble (BAB), Machine Gun (MGN), Large Crowd (LCR), and White Gaussian (WGN). Unlike existing approaches, this study provides an effective foundation for addressing noise in speech by emphasizing noise modeling so that available resources can be used to achieve more reliable overall performance in speech systems.
Read full abstract