Abstract
Robust distant speech recognition (DSR) is necessary in many speech technology applications using multiple microphones but has received only limited treatment in the literature. In this paper, we work on communicating with vehicle voice-controlled system which is one of the applications of DSR. Two approaches for DSR are i) signal-level combination using beamforming followed by automatic speech recognition (ACR), and ii) word hypothesis-level combination using several speech recognition engines followed by confusion network combination or followed by recognizer output voting error reduction (ROVER). In addition to these approaches, it is possible to examine training-level combination by training the recognizer on audio signals from multiple channels (microphones). In this paper, the authors investigate how these methods can be leveraged for in-vehicle ACR using the CU-Move corpus. The authors propose various combinations of these three methods to find an optimum structure for in-vehicle ACR. The authors also investigate the effect of speaker adaptation (SA). The author's experience shows that applying SA on individual channels and merging the results with ROVER reduces the negative effects of SA reported by others in the field, and illustrates the overall improvement obtained with front-end enhancement techniques in DSR.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have