Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Pui-Yu Hui,Helen Meng

doi:10.1109/taslp.2013.2294586

Abstract

This paper describes our work in semantic interpretation of a “multimodal language” with speech and gestures using latent semantic analysis (LSA). Our aim is to infer the domain-specific informational goal of multimodal inputs. The informational goal is characterized by lexical terms used in the spoken modality, partial semantics of gestures in the pen modality, as well as term co-occurrence patterns across modalities, leading to “multimodal terms.” We designed and collected a multimodal corpus of navigational inquiries. We also obtained perfect (i.e. manual) and imperfect (i.e. automatic via recognition) transcriptions for these. We automatically align parsed spoken locative references (SLRs) with their corresponding pen gesture(s) using the Viterbi alignment, according to their numeric and location type features. Then, we characterize each cross-modal integration pattern as a 3-tuple multimodal term with SLR, pen gesture type and their temporal relationship. We propose to use latent semantic analysis (LSA) to derive the latent semantics from manual (i.e. perfect) and automatic (i.e. imperfect) transcriptions of the collected multimodal inputs. In order to achieve this, both multimodal and lexical terms are used to compose an inquiry-term matrix, which is then factorized using singular value decomposition (SVD) to derive the latent semantics automatically. Informational goal inference based on the latent semantics shows that the informational goal inference accuracy of a disjoint test set is 99% and 84% when a perfect and imperfect projection model is used respectively, which performs significantly better than (at least 9.9% absolute) the baseline performance using vector-space model (VSM).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Feb 1, 2014
Citations: 53

Similar Papers

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions
Pui-Yu Hui ... Wai-Kit Lo
-
Pui-Yu Hui, et. al.Pui-Yu Hui ... Wai-Kit Lo
07 Feb 2010
07 Feb 2010

Genetic algorithm for text clustering based on latent semantic indexing
Wei Song ... Soon Cheol Park
Computers & Mathematics with Applications | VOL. 57
Wei Song, et. al.Wei Song ... Soon Cheol Park
11 Nov 2008
Computers & Mathematics with Applications | VOL. 57

Analysis of Web Clustering Based on Genetic Algorithm with Latent Semantic Indexing Technology
Wei Song ... Soon Cheol Park
-
Wei Song, et. al.Wei Song ... Soon Cheol Park
01 Jan 2007
01 Jan 2007

Learning the Latent Semantic Space for Ranking in Text Retrieval
Jun Yan ... Shuicheng Yan
-
Jun Yan, et. al.Jun Yan ... Shuicheng Yan
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing