심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식*

Min Sik Kim,Hyung Soon Kim

doi:10.13064/ksss.2020.12.3.065

Min Sik Kim, Hyung Soon Kim

Open Access

PDF Available

https://doi.org/10.13064/ksss.2020.12.3.065

Copy DOI

Export

Save

Cite

Journal: Phonetics and Speech Sciences	Publication Date: Sep 1, 2020
Citations: 1	License type: CC BY-NC 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Full Text