Abstract

Data-driven deep learning solutions with gradient-based neural architecture, have proven useful in overcoming some limitations of traditional signal processing techniques. However, a large number of reverberant–anechoic training utterance pairs covering as many environmental conditions as possible is required to achieve robust dereverberation performance in unseen testing conditions. In this article, we propose to address the data requirement issue while preserving the advantages of deep neural structures leveraging upon hierarchical extreme learning machines (HELMs), which are not gradient-based neural architectures. In particular, an ensemble HELM learning framework is established to effectively recover anechoic speech from a reverberant one based on spectral mapping. In addition to the ensemble learning framework, we further derive two novel HELM models, namely, highway HELM [HELM(Hwy)] and residual HELM [HELM(Res)], both incorporating low-level features to enrich the information for spectral mapping. We evaluated the proposed ensemble learning framework using simulated and measured impulse responses by employing Texas Instrument and Massachusetts Institute of Technology (TIMIT), Mandarin hearing in noise test (MHINT), and reverberant voice enhancement and recognition benchmark (REVERB) corpora. The experimental results show that the proposed framework outperforms both traditional methods and a recently proposed integrated deep and ensemble learning algorithm in terms of standardized objective and subjective evaluations under matched and mismatched testing conditions for simulated and measured impulse responses.

Highlights

  • IntroductionR EVERBERATION refers to the collection of reflected sounds from surfaces (e.g., walls and objects) in an acoustic enclosure

  • R EVERBERATION refers to the collection of reflected sounds from surfaces in an acoustic enclosure

  • Motivated by the promising performance attained by the hierarchical extreme learning machines (HELMs) for speech enhancement, we extend our research to HELM-based speech dereverberation by incorporating ensemble learning for spectral mapping from a reverberant to anechoic speech

Read more

Summary

Introduction

R EVERBERATION refers to the collection of reflected sounds from surfaces (e.g., walls and objects) in an acoustic enclosure. It has been shown to severely deteriorate the quality and intelligibility of speech signals for both human and machine listeners Such deterioration can substantially affect the performance of speech-related applications, for instance, automatic speech recognition [1]–[3] and speaker identification systems [4]–[6]. The second group of algorithms is based on homomorphic transformation, in which the reverberated speech signals are analyzed in the cepstral or spectral domain to subtract the reverberation from the signal. The nonlinear spectral mapping approaches have been developed to address the reverberation problem In these approaches, artificial neural networks (ANNs) are generally used to “learn” the mapping function of the reverberant and anechoic speech [21]. In [25]–[29], deep neural network (DNN)-based solutions have been proposed to

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call