Abstract

Speech enhancement (SE) aims to improve the intelligibility and perceptual quality of speech contaminated by noise signals through spectral or temporal changes. Deep learning models achieve speech enhancement and estimate the magnitude spectrum. This paper proposes a novel and computationally efficient deep learning model to enhance noisy speech. The model pre-processes the noisy speech magnitude by redistributing energy from high-energy voiced segments to low-energy unvoiced segments using an adaptive power law transformation while maintaining the total energy of the speech signals constant. A U-shaped fuzzy long short-term memory (UFLSTM) estimates the magnitude of a time-frequency (T-F) mask by using the pre-processed data. Residual connections to the similar-shaped layers are added to avoid gradient decay. Attention process is adopted by modifying the forget gate of UFLSTM. To make a causal speech enhancement system, the processing does not include any future audio frames. We compare the proposed speech enhancement to other deep learning models in different noisy environments with signal-to-noise ratios of 0 dB, 5 dB, and 10 dB. The experiments show that the proposed SE system outscores the competing deep learning models and considerably improves speech intelligibility and quality. In terms of STOI and PESQ, the LibriSpeech database improves results by (0.211) 21.1% and (0.95) 36.39%, respectively, over noisy speech in seen noisy conditions, and by (0.199) 19.9% and (0.94) 35.69% over noisy speech in unseen noisy conditions. Further, the cross-corpus analysis shows that proposed SE system performs better when trained with the DNS dataset as compared to the LibriSpeech, VoiceBank, and TIMIT datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.