Abstract
AbstractOne challenge in building collaborative design tools that use speech and sketch input is distinguishing gesture pen strokes from those representing device structure, that is, object strokes. In previous work, we developed a gesture/object classifier that uses features computed from the pen strokes and the speech aligned with them. Experiments indicated that the speech features were the most important for distinguishing gestures, thus indicating the critical importance of the speech–sketch alignment. Consequently, we have developed a new alignment technique that employs a two-step process: the speech is first explicitly segmented (primarily into clauses), then the segments are aligned with the pen strokes. Our speech segmentation step is unique in that it uses sketch features for locating segment boundaries in multimodal dialog. In addition, it uses a single classifier to directly combine word-based, prosodic (pause), and sketch-based features. In the second step, segments are initially aligned with strokes based on temporal correlation, and then classifiers are used to detect and correct two common alignment errors. Our two-step technique has proven to be substantially more accurate at alignment than the existing technique that lacked explicit segmentation. It is more important that, for nearly all cases, our new technique results in greater gesture classification accuracy than the existing technique, and performed nearly as well as the benchmark manual speech–sketch alignment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Artificial Intelligence for Engineering Design, Analysis and Manufacturing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.