Abstract

Purpose This study aimed to evaluate the performance and acceptance of responses generated by ChatGPT-3.5 and GPT-4 to Japanese childcare-related questions to assess their potential applicability and limitations in the childcare field, specifically focusing on the accuracy, usefulness, and empathy of the generated answers. Methods We evaluated answers in Japanese generated by GPT-3.5 and GPT-4 for two types of childcare-related questions. ① For the written examination questions of Japan's childcare worker national examination for 2023's fiscal year, we calculated the correct answer rates using official answers. ② We selected one question from each of the seven categories from the child-rearing questions posted on the Japanese National Childcare Workers Association's website and had GPT-3.5 and GPT-4 generate answers. These were evaluated alongside existing childcare worker answers by human professionals. Five childcare workers then blindly selected what they considered the best answer among the three and rated them on a five-point scale for 'accuracy,' 'usefulness,' and 'empathy.' Results In the examination consisting of 160 written questions, both GPT-3.5 and GPT-4 produced responses to all 155 questions, excluding four questions omitted due to copyright concerns and one question deemed invalid due to inherent flaws in the question itself, with correct answer rates of 30.3% for GPT-3.5 and 47.7% for GPT-4 (p<0.01). For the child-rearing Q&A questions, childcare worker answers by human professionals were chosen as the best answer most frequently (45.7%), followed by GPT-3.5 (31.4%) and GPT-4 (22.9%). While GPT-3.5 received the highest average rating for accuracy (3.69 points), childcare worker answers by human professionals received the highest average ratings for usefulness and empathy (both 3.57 points). Conclusions Both GPT-3.5 and GPT-4 failed to meet the passing criteria in Japan's childcare worker national examination, and for the child-rearing questions, GPT-3.5 was rated higher in accuracy despite lower correct answer rates. Over half of the childcare workers considered the ChatGPT-generated answers to be the best ones, yet concerns about accuracy were observed, highlighting the potential risk of incorrect information in the Japanese context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call