Abstract

Advancements in machine learning and deep learning benefit access control systems, forensics, and biometrics particularly in speaker identification systems. The SincNet architecture is a distinct convolutional neural network (CNN) designed for speaker identification. It operates by taking one-dimensional raw speech and directing it through its initial convolutional layer, consisting of Sinc filters. In this work, we present a SincsquareNet, a CNN that efficiently learns customized triangular band-pass filters using trainable Sinc-squared functions. Also, we propose the fusion of SincsqaureNet with SincNet for robust speaker identification. Further, a self-attention mechanism is employed to obtain discriminating features. The Librispeech dataset is used to validate the proposed framework. This approach makes use of the best of both filters, helps the network learn more robust features, and converges faster. The findings of the experiments show that, when compared to SincsquareNet, speaker identification accuracy was relatively improved by 8%, while validation loss was reduced by 7%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.