A Gender-Aware Deep Neural Network Structure for Speech Recognition

Toktam Zoughi,Mohammad Mehdi Homayounpour

doi:10.1007/s40998-019-00177-8

Abstract

Recently deep neural networks (DNNs) have attracted a great deal of interest among researchers for speech recognition. DNN training is computationally expensive due to the model large number of parameters. We can improve DNN performance by using pre-training methods. However, DNN learning is hard if pre-training is inefficient. This paper proposes a new pre-training method that utilizes both gender and phoneme information for speech recognition. We use speaker gender information across phoneme information to construct acoustic models more precisely. The new approach named gender-aware deep Boltzmann machine (GADBM) is used for DNN pre-training. GADBM utilizes additional information, which improves recognition accuracy. For this purpose, we have changed the overall structure of deep Boltzmann machine (DBM) to consider additional information. Experimental results on TIMIT dataset show that the proposed method outperforms deep belief network and DBM for phone recognition task. In addition, parameter tuning in the proposed method improves the model performance.

Full Text