Phone-based filter parameter optimization of filter and sum robust speech recognition using likelihood maximization

Bahram Kouhi-Jelehkaran,Hamidreza Bakhshi,Farbod Razzazi

doi:10.1016/j.aeue.2009.11.014

Abstract

Because of noise and reverberation, accuracy of speech recognition systems decreases when the distance between talker and microphone increases. By the using of microphone arrays and appropriate filtering of received signals, the accuracy of recognizer can be increased. Many different methods for using microphone arrays have been proposed that can be classified into two main approaches: systems that perform in two independent stages of array processing and then recognition and systems that use array processing to generate a sequence of features which maximize the likelihood of generating the correct hypothesis in recognition phase. Following second approach, in this paper a new method for microphone array processing is proposed in which the parameters of array processing are adjusted in calibration phase based on phones used in language and maximum likelihood method. Optimized filter parameters are stored and used during recognition phase. A new modified Viterbi algorithm using optimal phone-based filter parameters is used for recognition phase. The proposed algorithm is analytically formulated and Persian language is used to find any improvement in speech recognition accuracy compared with results of delay and sum and utterance-based filter and sum algorithms. The results show 12.2% improvement in accuracy compared to utterance-based algorithm.

Full Text