Dialogue systems must understand children's utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child-robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children's intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children's speech to enhance its intention reasoning performance in verbal interactions with children. To this end, we propose a fine-tuning methodology that utilizes the LLM-human judgment discrepancy and interactive response data. The former data represent cases in which the LLM and human judgments of the contextual appropriateness of a child's answer to a robot's question diverge. The latter data involve robot responses suitable for children's utterance intentions, generated by the LLM. We developed a fine-tuned dialogue system using these datasets to achieve human-like interpretations of children's utterances and to respond adaptively. Our system was evaluated through human assessment using the Robotic Social Attributes Scale (RoSAS) and Sensibleness and Specificity Average (SSA) metrics. Consequently, it supports the effective interpretation of children's utterance intentions and enables natural verbal interactions, even in cases with syntactic incompleteness and mispronunciations.
Read full abstract