DNN-based Speech Recognition System dealing with Motor State as Auxiliary Information of DNN for Head Shaking Robot

Moa Lee,Joon-Hyuk Chang

doi:10.1109/iros.2018.8593396

Abstract

In this paper, a deep neural network (DNN) based integrated background noise suppression and acoustic modeling for speech recognition proposed in which on/off state of the motor for the head shaking robot is employed as the relevant auxiliary information of the DNN input. Since the motor sound being generated when the robot is moving or shaking its head severely degrades the performance of the speech recognition accuracy, we propose to use the motor on/off state as additional information when designing the DNN-based recognition system. Our speech recognition algorithm consists of two parts including the feature mapping model for feature enhancement and the acoustic model for phoneme recognition. As for the feature mapping, the stacked DNN is designed for the precise feature enhancement such that the lower DNN and upper DNN are trained separately and combined after which the motor state is plugged into both the lower DNN and upper DNN in addition to the input noisy speech. Then, the acoustic model is trained upon the feature enhancement model in which the motor state is again used as the augmented feature. The proposed technique to suppress the acoustic and motor noises was evaluated in term of the phoneme error rate (PER) and showed a significant improvement over the conventional system.

Full Text