Abstract

This paper implements the Multi-Modal Instruction Agent (hereinafter, MMIA) including a synchronization between audio-gesture modalities, and suggests improved fusion and fission rules depending on SNNR (Signal Plus Noise to Noise Ratio) and fuzzy value for simultaneous multi-modality, based on the embedded KSSL (Korean Standard Sign Language) recognizer using the WPS (Wearable Personal Station) and Voice-XML. Our approach fuses and recognizes the sentence and word-based instruction models that are represented by speech and KSSL, and then translates recognition result that is fissioned according to a weight decision rule into synthetic speech and visual illustration (graphical display by HMD-Head Mounted Display) in real-time. The experimental results, average recognition rates of the MMIA for the prescribed 62 sentential and 152 word instruction models were 94.33% and 96.85% in clean environments, and 92.29% and 92.91% were shown in noisy environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call