Abstract

This study investigates the utilization of complex operations to perform multichannel speech enhancement in the time domain using a neural network. Previous studies have demonstrated the advantages of incorporating complex operations when designing neural networks; however, they have solely focused on frequency-domain enhancement techniques. In contrast, our research study presents an end-to-end approach to perform speech enhancement in the time domain. We used the Hilbert transform to intelligently generate complex time-domain waveforms as inputs to the network. This allowed us to create an end-to-end approach that explores spatial information. To handle the complexity of the inputs, we developed a complex neural adaptive beamformer (CNAB). We utilized complex shared long short-term memory (LSTM), split-LSTM, and complex convolutions to generate the beamforming output. Following this, we developed a complex full convolutional network (CFCN) to enhance the beamforming output. We leveraged complex dilated convolutions to model the long-term temporal dependencies of speech. By cascading the CNAB and CFCN, we created the final end-to-end time-domain enhancement network, named CNABCFCN. We trained and tested CNABCFCN using the deep noise suppression (DNS) challenge dataset. Our results demonstrate the advantages of using complex operations over the baseline model. Furthermore, the proposed CNABCFCN performed better in terms of both objective and subjective measures compared with other networks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.