Abstract

Speech and gesture are two primary modes used in natural human communication; hence, they are important inputs for a multimodal interface to process. One of the challenges for multimodal interfaces is to accurately recognize the words in spontaneous speech. This is partly due to the presence of speech repairs, which seriously degrade the accuracy of current speech recognition systems. Based on the assumption that speech and gesture arise from the same thought process, we would expect to find patterns of gesture that co-occur with speech repairs that can be exploited by a multimodal processing system to more effectively process spontaneous speech. To evaluate this hypothesis, we have conducted a measurement study of gesture and speech repair data extracted from videotapes of natural dialogs. Although we have found that gestures do not always co-occur with speech repairs, we observed that modification gesture patterns have a high correlation with content replacement speech repairs, but rarely occur with content repetitions. These results suggest that gesture patterns can help us to classify different types of speech repairs in order to correct them more accurately.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call