A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features

Bin Liu,Jianhua Tao,Dawei Zhang,Yibin Zheng

doi:10.1109/icassp.2017.7952173

Abstract

Pitch is an important characteristic of speech and is useful for many applications. However, it is still challenging to estimate pitch in strong noise. In this paper, we propose a joint training approach to determinate pitch. First, a Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTMRNN) is trained to map the noisy to clean speech features. Second, the pitch estimation is also a BLSTM-RNN model. The feature mapping neural network serves as a noise normalization module aiming at explicitly generating the clean features which are easier to estimate pitch by the following neural network. BLSTM-RNN is trained on sequential frame-level features and capable of learning temporal dynamics. We also propose to take into account bottleneck features for pitch estimation. The experimental results show that the proposed method can obtain accurate pitch estimation and they show good generalization ability to new speakers and noisy conditions. The proposed approach also significantly outperforms other state-of-the-art pitch estimation algorithms.

Full Text