Abstract

We present a comparative analysis of multi-modal user inputs with speech and pen gestures, together with their semantically equivalent uni-modal (speech only) counterparts. The multimodal interactions are derived from a corpus collected with a Pocket PC emulator in the context of navigation around Beijing. We devise a cross-modality integration methodology that interprets a multi-modal input and paraphrases it as a semantically equivalent, uni-modal input. Thus we generate parallel multi-modal (MM) and uni-modal (UM) corpora for comparative study. Empirical analysis based on class trigram perplexities shows two categories of data: (PPMM = PPUM) and (PPMM < PPUM). The former involves complementarity across modalities in expressing the user’s intent, including occurrences of ellipses. The latter involves redundancy, which will be useful for handling recognition errors by exploring mutual reinforcements. We present explanatory examples of data in these two categories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call