Acoustic-prosodic Features Research Articles

While human tutors respond to both what a student says and to how the student says it, most tutorial dialogue systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of spoken tutoring dialogues, one with a human tutor, and one with a computer tutor. We first annotate student turns for negative, neutral and positive student states in both corpora. We then automatically extract acoustic–prosodic features from the student speech, and lexical items from the transcribed or recognized speech. We compare the results of machine learning experiments using these features alone, in combination, and with student and task dependent features, to predict student states. We also compare our results across human–human and human–computer spoken tutoring dialogues. Our results show significant improvements in prediction accuracy over relevant baselines, and provide a first step towards enhancing our intelligent tutoring spoken dialogue system to automatically recognize and adapt to student states.

The design of robust interfaces that process conversational speech is a challenging research direction largely because users' spoken language is so variable. This research explored a new dimension of speaker stylistic variation by examining whether users' speech converges systematically with the text-to-speech (TTS) heard from a software partner. To pursue this question, a study was conducted in which twenty-four 7 to 10-year-old children conversed with animated partners that embodied different TTS voices. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously adapt several basic acoustic-prosodic features of their speech 10--50%, with the largest adaptations involving utterance pause structure and amplitude. Children's speech adaptations were relatively rapid, bidirectional, and dynamically readaptable when introduced to new partners, and generalized across different types of users and TTS voices. Adaptations also occurred consistently, with 70--95% of children converging with their partner's TTS, although individual differences in magnitude of adaptation were evident. In the design of future conversational systems, users' spontaneous convergence could be exploited to guide their speech within system processing bounds, thereby enhancing robustness. Adaptive system processing could yield further significant performance gains. The long-term goal of this research is the development of predictive models of human-computer communication to guide the design of new conversational interfaces.

Acoustic-prosodic Features Research Articles

Related Topics

Articles published on Acoustic-prosodic Features

Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors

Natural head motion synthesis driven by acoustic prosodic features

Toward adaptive conversational interfaces

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Acoustic-prosodic Features Research Articles

Related Topics

Articles published on Acoustic-prosodic Features

Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors

Natural head motion synthesis driven by acoustic prosodic features

Toward adaptive conversational interfaces