Abstract

Machine learning (ML) techniques are commonly seen as an inductive learning procedure, typically involving the identification of patterns in a specific training dataset to make predictions in novel contexts. By doing so, the performance and generalizability of these techniques often rely on the quality and quantity of the available training data. However, gathering a diverse training dataset that captures multiple nuances of students’ reasoning poses challenges in educational settings due to resource constraints. We compared three data augmentation strategies to address this issue: collecting additional student data, utilizing chatbots to paraphrase existing responses, and prompting chatbots to generate synthetic responses. We found that leveraging data augmentation significantly improved ML model performance. In detail, combining authentic and/or paraphrased responses with chatbot responses yielded the best machine-human score agreements across various validation conditions. This data augmentation allowed us to expand our applied scoring rubric by introducing a more detailed categorization that better captured the level of causality in undergraduate chemistry students’ reasoning about reaction mechanisms. Together, these findings highlight effective possibilities for augmenting the size and heterogeneity of the training data to improve ML model performance and generalizability, introduce a more fine-grained categorization, and reduce human effort in data collection. In the future, these benefits may enhance the scalability of formative assessments that adaptively support students’ reasoning in postsecondary chemistry classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call