Abstract

A voice user interface (VUI) is the script to a conversation between an automated system and a user. This script contains all the utterances that the automated system speaks to the user and the logic to decide which utterances to speak in response to user input. Underlying the voice user interface is speech recognition technology that has the ability to capture and decode the user's spoken input to allow the system to “understand” what the user has said. Users come into automated conversations with a set of expectations about how spoken conversation should work and the appropriate way to behave as a cooperative speaker and listener. The overwhelming majority of users' experience comes from unscripted human-to-human speech, a feat that is far outside the capabilities of today's speech technology. The majority of these automated conversations between a user and a VUI take place over the phone, as most speech technology is currently deployed in speech-enabled interactive voice response interfaces. VUIs are being added to the user experience of mobile and hand-held devices, in-vehicle navigation systems, and desktop computer applications. Commercial applications of speech recognition are largely bottom-up systems that achieve understanding via statistically based matching of the user's spoken input with stored acoustic models of speech sounds, words, and pronunciations. Human beings, by contrast, rely heavily on top-down knowledge about meaning and context when recognizing speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call