Recent advancements in speech enhancement have witnessed the emergence of generator-based methodologies. However, several of these approaches exhibit complexity in handling input variations, either excelling at low signal-to-noise ratios (SNRs) by utilizing intricate representations of noisy and clean speech or demonstrating superior performance only at higher SNRs. In this work, we investigated speech enhancement using a Dilated Attention Fast Generative Adversarial Network (DAF-GAN). The proposed DAF-GAN framework achieves stability in performance across different SNR conditions by efficiently processing large-scale signal lengths. The DFS-GAN features a dilated discriminator model operating via patches. The generator architecture incorporates multi-decoding and attention gates facilitated through skip-connections, strategically integrated within the Fast-U-Net model to optimize processing speed. An ideal ratio mask was used in the test phase to further refine the enhanced signal by emphasizing target speech while suppressing residual noise or artifacts. The DAF-GAN performance was assessed using objective metrics such as PESQ on a number of noisy speech databases. Results revealed that the DAF-GAN performed modestly in comparison with the state-of-the-art models. For example, analyses of the VoiceBank-DEMAND dataset yielded a PESQ score of 2.50 for the DAF-GAN.
Read full abstract