Abstract

Since the introduction of deep neural network (DNN)-based acoustic model to automatic speech recognition (ASR), robust ASR using DNN are being in research. However, most DNN-based techniques are performed without consideration of the reliability of the estimates and this degrades the ASR performance especially in the training-test mismatch conditions. In this paper, we propose a novel deep learning-based acoustic modeling technique which measures and takes account of the reliability using a single DNN. The proposed approach describes the mapping between the noisy input and clean features as a stochastic process. Therefore, a statistical modeling is applied to the DNN-based acoustic model in predicting the posterior distribution of the clean speech features given a distorted input data. Also, by attempting the two different probabilistic models in clean feature distribution assumption, we investigate which distribution is more proper on various environment conditions. It has been shown that the proposed technique outperforms the conventional DNN-based techniques on Aurora-4 DB and mismatched noise conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call