Generative AI is developing at an exponential pace, showing immense potential for reshaping teaching and learning methodologies. Designing high-quality examinations demands careful consideration, such as incorporating diverse question types to assess different levels of understanding and ensuring alignment with learning outcomes. However, many university educators encounter challenges in designing effective examinations due to insufficient training. This paper outlines our work-in-progress in leveraging generative AI, specifically large language models, as a copilot in examination design. In this study, we are evaluating large language models’ performance in assessing examination questions based on four key quality criteria. Preliminary findings indicate that ChatGPT 3.5 achieved comparable performance to the state-of-the-art in classifying questions by cognitive complexity, as defined by Bloom’s taxonomy. Additionally, ChatGPT 3.5 demonstrated promising initial results across other criteria. In conclusion, generative AI shows substantial potential to serve as a valuable tool in assisting university educators in enhancing the overall quality of examination design.
Read full abstract