Abstract

A microphone array is a promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using a microphone array. However localization of a moving talker is difficult in noisy reverberant environments. Talker localization errors degrade the performance of speech recognition. To solve the problem, this paper proposes a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously. The proposed algorithm performs a Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. As a result, a locus of the talker and a phoneme sequence of the speech are obtained by finding an optimal path with the highest likelihood. To evaluate the performance of the proposed algorithm, speech recognition experiments are carried out on simulated data and real environment data. These results show that the proposed algorithm works well even if the talker moves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call