Abstract

Speaker recognition is a technology that uses identity information in the human voice for identity recognition, which owns many advantages in convenient information gathering, low gathering cost and high recognition accuracy. However, the difficulty in gathering messages within short utterance declines the voiceprint recognition function rapidly. We propose a recognition model based on SincNet in the aim of obtaining enough feature information in short utterance. The model used a set of learnable Sinc-based filter banks to extract feature directly from primordial voice in featured extraction layer, which enabled neural networks to discover more valuable voiceprint information; In the pooling layer, we designed the pooling method of dual attention mechanism, which combined multiple self-attention mechanism and self-attention mechanism to enrich the feature information and enhance the differentiation degree of key features so as to solve the defect of short speech with less information; choose ArcFace as the loss function, which can maximize the classification limit in the Angle space, thus improving the classification ability of the model. Experimental results demonstrate that the proposed model performs better than the benchmark model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call