Abstract

Consumer electronics equipped with a microphone array, such as car navigation devices and headsets commonly implement speech enhancement techniques based on the gradient method to cope with additive noise. However, while these techniques had been originally developed for voice communication and can maximize the signal-to-distortion ratio (SDR), they cannot always maximize automatic speech recognition (ASR) accuracy. For this reason, the front-end speech enhancement parameters have been adjusted by human experts to each environment and acoustic model. In this study, we developed a novel system for maximizing the accuracy of a given ASR engine by automatically adjusting the front-end speech enhancement. The proposed method allows consumers to use ASR through the consumer electronics with less stress when ambient noise varies. A genetic algorithm (GA) is used to generate parameter values of the front-end speech enhancement for particular environments. The generated values can be dynamically assigned to input speech signals by preliminarily clustering the environments based on noise features. In evaluations, parameter values determined by our method outperformed one adjusted by a human expert.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call