Large Language Models (LLMs) are increasingly being used in educational settings to assist students with assignments and learning new concepts. For LLMs to be effective learning aids, students must develop an appropriate level of trust and reliance on these tools. Misaligned trust and reliance can lead to suboptimal learning outcomes and reduced LLM engagement. Despite their growing presence, there is a limited understanding of achieving optimal transparency and reliance calibration in the educational use of LLMs. In a 3x2 between-subjects experiment conducted in a university classroom setting, we tested the effect of two transparency disclosures (System Prompt and Goal Summary) and an in-conversation Reliability Disclaimer on a GPT-4-based chatbot tutor provided to students for an assignment. Our findings suggest that disclaimer messages included in the responses may effectively mitigate learners' overreliance on the LLM Tutor in the presence of incorrect advice. Disclosing System Prompt seemed to calibrate students’ confidence in their answers and reduce the occurrence of copy-pasting the exact assignment question to the LLM tutor. Student feedback indicated that they would like transparency framed in terms of performance-based metrics. Our work provides empirical insights on the design of transparency and reliability mechanisms for using LLMs in classrooms.
Read full abstract