Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature

Yahui Shan,Shixuan Du,Qingran Zhan,Jing Wang,Xiang Xie,Min Liu

doi:10.1109/apsipaasc47483.2019.9023251

Abstract

This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.

Full Text