Abstract

At present, Transformer architectures based on the self-attention mechanism have attracted wide attention due to their excellent performance in many fields, such as natural language processing and computer vision. However, in the field of speech processing, the potential of Transformer architectures has not been fully utilized. The main reason is that the position embedding mechanism in the Transformer cannot effectively perform sequence modeling in speech tasks, and the Transformer itself is challenging to train and converge. Inspired by recent research, we propose a new speech enhancement in this paper. Specifically, a recurrent neural network is used to replace the position embedding in Transformer, and the fully connected layer in the Transformer feed forward layer is replaced with one-dimensional convolution. In addition, the model also introduces a weighted residual connection mechanism. Experiment results show that compared with baseline models, the new model performs better in speech quality and speech intelligibility. At the same time, the convergence and inference speed are also much improved. Compared with the baseline model, the processing speed of the proposed method is increased by 4.27 times.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call