Abstract
In human-human interaction, people tend to adapt to each other as the conversation progresses, mirroring their intonation, speech rate, fundamental frequency, word selection, hand gestures, and head movements. This phenomenon is known as synchrony, convergence, entrainment, and adaptation. Recent studies have investigated this phenomenon at different dimensions and levels for single modalities. However, the interplay between modalities at a local level to study synchrony between conversational partners is an open question. This paper studies synchrony using a multimodal approach based on sequential pattern mining in dyadic conversations. This analysis deals with both acoustic and text-based features at a local level. The proposed data-driven framework identifies frequent sequences containing events from multiple modalities that can quantify the synchrony between conversational partners (e.g., a speaker reduces speech rate when the other utters disfluencies). The evaluation relies on 90 sessions from the Fishers corpus, which comprises telephone conversations between two people. We develop a multimodal metric to quantify synchrony between conversational partners using this framework. We report initial results on this metric by comparing actual dyadic conversations with sessions artificially created by randomly pairing the speakers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.