The development of multiple-choice reading comprehension items based on specific reading subskills is crucial in teaching, learning, and testing reading. But it remains a challenging task because this process is time-consuming and costly. This study aims to investigate the capability of ChatGPT (Chat Generative Pre-trained Transformer) for generating multiple-choice reading comprehension items. Psychometric models and human review were adopted to evaluate the item quality based on the benchmark of human-authored items. The results showed that ChatGPT-authored multiple-choice items were acceptable and comparable to human-authored items in terms of psychometric properties, and human review by questionnaire, expert judgment, and interview found that ChatGPT had the potential capability to serve as a test developer and assistant for teaching and learning reading. However, some shortcomings and potential pitfalls were also identified and room for improvement was discussed when ChatGPT is applied to generate items for educational purposes.
Read full abstract