Abstract
This paper presents a denoising and dereverberation hierarchical neural vocoder (DNR-HiNet) to convert noisy and reverberant acoustic features into a clean speech waveform. We implement it mainly by modifying the amplitude spectrum predictor (ASP) in the original HiNet vocoder. This modified denoising and dereverberation ASP (DNR-ASP) can predict clean log amplitude spectra (LAS) from input degraded acoustic features. To achieve this, the DNR-ASP first predicts the noisy and reverberant LAS, noise LAS related to the noise information, and room impulse response related to the reverberation information then performs initial denoising and dereverberation. The initial processed LAS are then enhanced by another neural network as the final clean LAS. To further improve the quality of the generated clean LAS, we also introduce a bandwidth extension model and frequency resolution extension model in the DNR-ASP. The experimental results indicate that the DNR-HiNet vocoder was able to generate a denoised and dereverberated waveform given noisy and reverberant acoustic features and outperformed the original HiNet vocoder and a few other neural vocoders. We also applied the DNR-HiNet vocoder to speech enhancement tasks, and its performance was competitive with several advanced speech enhancement methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.