Abstract
Many speech processing algorithms and applications rely on the explicit knowledge of signal-to-noise ratio (SNR) in their design and implementation. Estimating the SNR of a signal can enhance the performance of such technologies. We propose a novel method for estimating the long-term SNR of speech signals based on features, from which we can approximately detect regions of speech presence in a noisy signal. By measuring the energy in these regions, we create sets of energy ratios, from which we train regression models for different types of noise. If the type of noise that corrupts a signal is known, we use the corresponding regression model to estimate the SNR. When the noise is unknown, we use a deep neural network to find the “closest” regression model to estimate the SNR. Evaluations were done based on the TIMIT speech corpus, using noises from the NOISEX-92 noise database. Furthermore, we performed cross-corpora experiments by training on TIMIT and NOISEX-92 and testing on the Wall Street Journal speech corpus and DEMAND noise database. Our results show that our system provides accurate SNR estimations across different noise types, corpora, and that it outperforms other SNR estimation methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.