DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Tsung-Han Tsai,Tran Dang Khoa

doi:10.1016/j.mlwa.2023.100469

Tsung-Han Tsai, Tran Dang Khoa

Open Access

https://doi.org/10.1016/j.mlwa.2023.100469

Copy DOI

Abstract

Text-independent speaker verification provides people identified from their voice characteristics. In this paper, we propose a new method, Dual-Sequences Gate Attention Unit to improve the accuracy of a massive speaker verification system. Dual-Sequences Gate Attention Unit is based on the Gated Dual Attention Unit and the Gated Recurrent Unit. Two different inputs from the same source are the state pooling layer in the x-vector and the frame layer information in the x-vector. It is developed by applying the attention mechanism to the traditional Gated Recurrent Unit to enhance the learning ability of the x-vector system. The whole system follows the statistics pooling from each time-delay neural network layer of the x-vector baseline. It passes through the Dual-Sequences Gate Attention Unit layer to aggregate more information from the variant temporal context of input features while training at the frame level. We train our model on the Voxceleb2 and then evaluate the accuracy of Voxceleb1 and the Speakers in the Wild dataset for simulation. Finally, the system is compared with the x-vector, L-vector, and ETDNN-OPGRUs x-vector. There is an obvious improvement to our proposed method. Compared with the x-vector system, it shows that at least 17.5% on Voxceleb1 and 0.5% on Speakers in the Wild equal error rate improvement is achieved in the fusion system.

Full Text