Reduction of Computational Cost Using Two-Stage Deep Neural Network for Training for Denoising and Sound Source Identification

Takayuki Morito,Ryosuke Kojima,Osamu Sugiyama,Kazuhiro Nakadai,Satoshi Uemura

doi:10.1007/978-3-319-42007-3_49

Abstract

This paper addresses reduction of computational cost in training of a Deep Neural Network (DNN), in particular, for sound identification using highly noise-contaminated sound recorded with a microphone array embedded in an Unmanned Aerial Vehicle (UAV), aiming at people’s voice detection quickly and widely in a disastrous situation. It is known that a DNN training method called end-to-end training shows high performance, since it uses a huge neural network with high non-linearity which is trained with a large amount of raw input signals without preprocessing. Its computational cost is, however, expensive due to the high complexity of the neural network. Therefore, we propose two-stage DNN training using two separately-trained networks; denoising of sound sources and sound source identification. Since the huge network is divided into two smaller networks, the complexity of the networks is expected to decrease and each of them can consider a specific model of denoising and identification. This results in faster convergence and computational cost reduction in DNN training. Preliminary results showed that only 71 % of training time was necessary with the proposed two staged network, while maintaining the accuracy of sound source identification, compared to end-to-end training using noisy acoustic signals recorded with an 8 ch circular microphone array embedded in a UAV.

Full Text