Abstract

This paper introduces a new methodology of multimodal corpus creation for audio-visual speech recognition in driver monitoring systems. Multimodal speech recognition allows using audio data when video data are useless (e.g. at nighttime), as well as applying video data in acoustically noisy conditions (e.g., at highways). The article discusses several basic scenarios when speech recognition in the vehicle environment is required to interact with the driver monitoring system. The methodology defi nes the main stages and requirements for the design of a multimodal building. The paper also describes metaparameters that the multimodal corpus must correspond to. In addition, a software package for recording an audiovisual speech corpus is described.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call