Abstract

Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our proposed learned filterbank is also nonlinear with respect to center frequencies of subband filters similar to standard filterbanks (such as Mel, Bark, ERB, etc.). We have used our proposed model as a front-end to learn features and applied to speech recognition task. Performance of ConvRBM features is improved compared to MFCC with relative improvement of 5% on TIMIT test set and 7% on WSJ0 database for both Nov'92 test sets using GMM-HMM systems. With DNN-HMM systems, we achieved relative improvement of 3% on TIMIT test set over MFCC and Mel filterbank (FBANK). On WSJ0 Nov'92 test sets, we achieved relative improvement of 4–14% using ConvRBM features over MFCC features and 3.6–5.6% using ConvRBM filterbank over FBANK features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call