Abstract
During face-to-face interactions, people naturally integrate nonverbal behaviors such as facial expressions and body postures as part of the conversation to infer the communicative intent or emotional state of their interlocutor. The interpretation of these nonverbal behaviors will often be contextualized by interactional cues such as the previous spoken question, the general discussion topic or the physical environment. A critical step in creating computers able to understand or participate in this type of social face-to-face interactions is to develop a computational platform to synchronously recognize nonverbal behaviors as part of the interactional context. In this platform, information for the acoustic and visual modalities should be carefully synchronized and rapidly processed. At the same time, contextual and interactional cues should be remembered and integrated to better interpret nonverbal (and verbal) behaviors. In this article, we introduce a real-time computational framework, MultiSense, which offers flexible and efficient synchronization approaches for context-based nonverbal behavior analysis. MultiSense is designed to utilize interactional cues from both interlocutors (e.g., from the computer and the human participant) and integrate this contextual information when interpreting nonverbal behaviors. MultiSense can also assimilate behaviors over a full interaction and summarize the observed affective states of the user. We demonstrate the capabilities of the new framework with a concrete use case from the mental health domain where MultiSense is used as part of a decision support tool to assess indicators of psychological distress such as depression and post-traumatic stress disorder (PTSD). In this scenario, MultiSense not only infers psychological distress indicators from nonverbal behaviors but also broadcasts the user state in real-time to a virtual agent (i.e., a digital interviewer) designed to conduct semi-structured interviews with human participants. Our experiments show the added value of our multimodal synchronization approaches and also demonstrate the importance of MultiSense contextual interpretation when inferring distress indicators.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.