Abstract

Short Message Service (SMS) messaging is very popular, especially among teens. Because research has shown that SMS messaging while driving results in 35% slower reaction time than being intoxicated [7], campaigns have been launched by states, governments and even cell phone carriers to discourage and ban SMS users from messaging while driving [1]. At the same time, automobile automotive infotainment systems such as the Ford Sync [2] now provide drivers the ability to hear incoming messages using text-to-speech (TTS). But how should users respond to these messages while driving in a safe manner?Automatic speech recognition (ASR) affords automobile drivers a hands-free, eyes free method of replying to SMS messages. In [3], we examined three approaches to leveraging ASR for SMS replies: dictation using a language model trained on SMS responses, canned responses using a probabilistic context-free grammar (PCFG), and a approach based on template matching. Voice search proceeds in two steps [6]: an utterance is first converted into text, which is then used as a search query to match the most similar items of an index. For SMS replies, we created an index of SMS response templates, with slots for concepts such as time and place, from a large SMS data collection. After convolving recorded SMS replies so that the audio would exhibit the acoustic characteristics of in-car recognition, they compared how the three approaches handled the convolved audio with respect to the top n-best reply candidates. The voice search approach consistently outperformed dictation and canned responses, achieving as high as 89.7% task completion with respect to the top 5 reply candidates.Even if the voice search approach may be more robust to in-car noise, this does not guarantee that it will be more usable. Indeed, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, in [4] we compared the voice search approach to the dictation approach in terms of both driving performance and task performance measures. Although the two approaches did not differ in terms of driving performance, users made five times more errors on average using dictation than voice search. Hence, verifying whether SMS response templates matched the meaning of an intended reply is much less prone to error than deciphering the sometimes nonsensical misrecognitions of dictation. And as prior research [5] has shown, because ASR errors with in-car speech interfaces negatively impacts driving performance, the safest way to respond to SMS messages in automobiles may just be the voice search approach.For MIAA, we will demonstrate a multimodal interface for SMS replies based on voice search, as shown in Figure 1.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call