Abstract

Abstract In this study, we investigated the relationship between turn-taking and prosody. We considered that to interact smoothlyin real-time communication, speakers must show presignalsto turn-taking as prosodic features before turn edges. We at-tempted to discriminate the turn change by the decision treemethod using only prosodic features in turn-final accentualphrases that include earlier positions compared with turn-finalmora. In the discrimination experiment, we used the corpus ofJapanese spontaneous dialogue, and defined prosodic parame-ters such as F0 contour, power contour and duration. We com-pared the two parameter conditions for using parameters withand without the final mora of turns. From the results, the ac-curacy under the conditions of not using the parameters of thefinal mora is 80%, which is not significantly worse than the re-sult of 83% when using all parameters. Taking into accountonly prosody was used, we consider this result to be reasonablygood. 1. Introduction In real-time communication, we can interact very smoothly us-ing speech. There has been a growing appreciation of the im-portant role of prosody in human-human, and also in human-machine communication [1, 2]. Prosody has functions that en-able listeners to achieve real-time and easy understanding, andto control dialogue smoothly. Making effective use of prosodicinformation leads us to the expectations of improvements inthetechnologiesofspeechunderstanding,speechsynthesis,andspoken dialogue systems.In this study, we focus on the dialogue management func-tions of prosody with respect to turn-taking. There have beenmany previous studies on turn-taking and prosodic informa-tion, in various research fields. For example, intonation pat-terns at sentence boundaries are relevant to modality and dis-course functions [3, 4, 5, 6, 7, 8]. From another point of view,for practical applications such as human-machine dialogue sys-tems, prosodic features are used to detect suitable timing forturn-taking or backchannel [9, 10, 11, 12]. Most of these stud-ieshaveshownthatparticularcombinationsoflexical,syntacticand prosodic information in turn-final can function as cues forsignalling that a speaker wants to keep the floor or wants to endthe turn.In order to judge whether it is possible to take the turn ornot, however, the hearer does not necessarily have to perceivethe speaker’s utterance to the last phoneme. We observed thatturn-taking proceeded very smoothly with minimal delay be-tween consecutive speaking turns. In some cases, there werealsosuccessfulturntransitionswithshortoverlap,called“latch-ing”. Therefore, when taking into account these phenomena, inaddition to the above cues at the edges of turns, it is consideredthat more global cues to turn-taking exist. The speakers mightshow presignals as prosodic features before the turn edge, sothat the listeners clearlyknow whether aspeaker wantsto finisha speaking turn at an earlier time before a possible transitionpoint. Moreover, in our previous study, we evaluated the effec-tiveness of prosodic features at earlier positions for estimatingthe syntactic structure [13, 14].Thus, in the present study, we aim to treat prosodic func-tions more positively with respect to turn-taking. We considerthat prosodic information might have some potential to morestrongly express a speaker’s attitude regarding turn-taking.From this point of view, we attempt to judge whether the turnchanged or not using only prosodic features. We focused onthe final phrases of utterances which include not only turn-finalmora but also earlier positions of turn-final, and also attempteddiscrimination under the condition of using all features exceptthe final mora. We used the contours, heights and peaks of F0and the power, duration and speaking rate as the prosodic pa-rameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call