Abstract
Learning interpretable filters in Convolutional Neural Networks (CNNs) is an approach that helps to build models with better generalization ability. Interpretable filters can reveal some hidden aspects of the task and help to improve the model. One of the most successful approaches in the field of the speech processing is SincNet, where the model learns some band-pass filters in the first layer of a CNN with a raw waveform as its input. In this paper, similar to SincNet, some meaningful filters are proposed, which here are inspired by Infinite Impulse Response (IIR) filters. The proposed model uses a phase correction process to ensure that phase linearity is satisfied. The effective length of the truncated IIR filter is calculated based on the accumulated energy, and the effect of changing the filter size on the final results has been investigated. The proposed model is evaluated in the speaker identification task on the TIMIT and Librispeech datasets and compared with traditional CNNs and four interpretable kernel-based models. The experimental results show the superiority of the proposed model both in performance and convergence speed. Moreover, some patterns of the speech signal, which lead to uniquely identifying a speaker, are analyzed by examining the spectrum of the learned filters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.