Abstract
We propose time-frequency domain methods for noise estimation and speech enhancement. A speech presence detection method is used to find connected time-frequency regions of speech presence. These regions are used by a noise estimationmethod and both the speech presence decisions and the noise estimate are used in the speech enhancement method. Different attenuation rules are applied to regions with and without speech presence to achieve enhanced speech with natural sounding attenuated background noise. The proposed speech enhancement method has a computational complexity, which makes it feasible for application in hearing aids. An informal listening test shows that the proposed speech enhancement method has significantly higher mean opinion scores than minimum mean-square error log-spectral amplitude (MMSE-LSA) and decision-directed MMSE-LSA.
Highlights
The performance of many speech enhancement methods relies mainly on the quality of a noise power spectral density (PSD) estimate
By removing the power in speech absence regions and speech presence regions from the noisy speech periodogram, we see in Figures 4a and 4b, respectively, that most of the speech, that is detectable by visual inspection, has been detected by the proposed algorithm
We evaluate the performance of the noise estimation methods by means of their spectral distortion, which we measure as segmental noise-to-error ratios (SegNERs)
Summary
The performance of many speech enhancement methods relies mainly on the quality of a noise power spectral density (PSD) estimate. When the noise estimate differs from the true noise, it will lead to artifacts in the enhanced speech. Our aim is to exploit spectral and temporal masking mechanisms in the human auditory system [1] to reduce the perception of these artifacts in speech presence regions and eliminate the artifacts in speech absence regions. We achieve this by leaving downscaled natural sounding background noise in the enhanced speech in connected time-frequency regions with speech absence. The downscaled natural sounding background noise will spectrally and temporally mask artifacts in the speech estimate while preserving the naturalness of the background noise
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.