Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

Purvi Agrawal,Sriram Ganapathy

doi:10.1109/taslp.2020.3030489

Abstract

The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this work, we propose a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself. The relevance weighting is achieved using a sub-network approach that performs the task of feature selection. A relevance sub-network, applied on the output of first layer of a convolutional neural network model operating on raw speech signals, acts as an acoustic filterbank (FB) layer with relevance weighting. A similar relevance sub-network applied on the second convolutional layer performs modulation filterbank learning with relevance weighting. The full acoustic model consisting of relevance sub-networks, convolutional layers and feed-forward layers is trained for a speech recognition task on noisy and reverberant speech in the Aurora-4, CHiME-3 and VOiCES datasets. The proposed representation learning framework is also applied for the task of sound classification in the UrbanSound8K dataset. A detailed analysis of the relevance weights learned by the model reveals that the relevance weights capture information regarding the underlying speech/audio content. In addition, speech recognition and sound classification experiments reveal that the incorporation of relevance weighting in the neural network architecture improves the performance significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 16

Similar Papers

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Purvi Agrawal ... Sriram Ganapathy
-
Purvi Agrawal, et. al.Purvi Agrawal ... Sriram Ganapathy
25 Oct 2020
25 Oct 2020

Representation Learning for Speech Recognition Using Feedback Based Relevance Weighting
Purvi Agrawal ... Sriram Ganapathy
-
Purvi Agrawal, et. al.Purvi Agrawal ... Sriram Ganapathy
06 Jun 2021
06 Jun 2021

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition
Dimitri Palaz ... Ronan Collobert
Speech Communication | VOL. 108
Dimitri Palaz, et. al.Dimitri Palaz ... Ronan Collobert
30 Jan 2019
Speech Communication | VOL. 108

Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
Dimitri Palaz ... Mathew Magimai-Doss
-
Dimitri Palaz, et. al.Dimitri Palaz ... Mathew Magimai-Doss
25 Aug 2013
25 Aug 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing