Abstract

Autonomous speech-enabled applications such as speech-to-speech machine translation, conversational agents, and spoken dialogue systems need to be able to distinguish system-directed user input from “off-talk” to function appropriately. “Off-talk” occurs when users speak to themselves or to others, often causing the system to mistakenly respond to speech that was not directed to it. Automatic detection of off-talk could help prevent such errors, and make the user’s interaction with the system more natural. It has been observed that speech in human-human dialogue and in soliloquy is prosodically different from speech directed at machines, and that the right hemisphere of the human brain is the locus of control of speech prosody. In this study, we explore human brain activity prior to speech articulation alone and in combination with prosodic features to create models for off-talk prediction. The proposed EEG based models are a step towards improving response time in detecting system-directed speech in comparison with audio-based methods of detection, opening new possibilities for the integration of brain-computer interface techniques into interactive speech systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call