Abstract
Recent advancements in speech enhancement have witnessed the emergence of generator-based methodologies. However, several of these approaches exhibit complexity in handling input variations, either excelling at low signal-to-noise ratios (SNRs) by utilizing intricate representations of noisy and clean speech or demonstrating superior performance only at higher SNRs. In this work, we investigated speech enhancement using a Dilated Attention Fast Generative Adversarial Network (DAF-GAN). The proposed DAF-GAN framework achieves stability in performance across different SNR conditions by efficiently processing large-scale signal lengths. The DFS-GAN features a dilated discriminator model operating via patches. The generator architecture incorporates multi-decoding and attention gates facilitated through skip-connections, strategically integrated within the Fast-U-Net model to optimize processing speed. An ideal ratio mask was used in the test phase to further refine the enhanced signal by emphasizing target speech while suppressing residual noise or artifacts. The DAF-GAN performance was assessed using objective metrics such as PESQ on a number of noisy speech databases. Results revealed that the DAF-GAN performed modestly in comparison with the state-of-the-art models. For example, analyses of the VoiceBank-DEMAND dataset yielded a PESQ score of 2.50 for the DAF-GAN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.