Abstract

Conversational agents provide powerful opportunities to interact and engage with the users. The challenge is how to create naturalistic behaviors that replicate the complex gestures observed during human interactions. Previous studies have used rule-based frameworks or data-driven models to generate appropriate gestures, which are properly synchronized with the underlying discourse functions. Among these methods, speech-driven approaches are especially appealing given the rich information conveyed on speech. It captures emotional cues and prosodic patterns that are important to synthesize behaviors (i.e., modeling the variability and complexity of the timings of the behaviors). The main limitation of these models is that they fail to capture the underlying semantic and discourse functions of the message (e.g., nodding). This study proposes a speech-driven framework that explicitly model discourse functions, bridging the gap between speech-driven and rule-based models. The approach is based on dynamic Bayesian Network (DBN), where an additional node is introduced to constrain the models by specific discourse functions. We implement the approach by synthesizing head and eyebrow motion. We conduct perceptual evaluations to compare the animations generated using the constrained and unconstrained models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.