A Method of Tibetan Speech Enhancement Based on Neural Network

Dan Chen,Cai Zhijie,Gaopan Li,Cai Rangzhuoma

doi:10.1109/icnlp52887.2021.00026

Abstract

Aiming at the problem that the convolutional recurrent neural network speech enhancement method cannot extract the speech harmonic information efficiently. This paper proposes a Tibetan speech enhancement method based on the Attention Mechanism Cascaded Convolutional Recurrent Neural (ACRNN) network model. Firstly, the input time domain speech signal is converted into a frequency domain signal using short-time Fourier transform. Secondly, the harmonic characteristics are extracted through the harmonic perception attention mechanism module, and the attention tensor and the original speech spectrum are spliced into the CRNN network for speech enhancement. Finally, the short-time inverse Fourier transform is used to transform the signal from the frequency domain to the time domain to generate the speech waveform after noise reduction. Experimental data shows that compared to directly using CRNN network speech noise reduction, this method has improved speech quality and intelligibility.

Full Text