Abstract
In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker, and the signal-to-noise ratio (SNR). Furthermore, we investigate how specialized DNN-based SE systems, which have been trained to be either noise type specific, speaker specific or SNR specific, perform relative to DNN based SE systems that have been trained to be noise type general, speaker general, and SNR general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained specifically to handle certain speakers, noise types and SNRs, are capable of achieving large improvements in estimated speech quality (SQ) and speech intelligibility (SI), when tested in matched conditions. Furthermore, we show that improvements in estimated SQ and SI can be achieved by a DNN-based SE system when exposed to unseen speakers, genders and noise types, given a large number of speakers and noise types have been used in the training of the system. In addition, we show that a DNN-based SE system that has been trained using a large number of speakers and a wide range of noise types outperforms a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI for others.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.