Abstract
The study applies machine learning (ML) algorithms to investigate the association between the length of a test item written in Chinese (through word count), item difficulty, and students’ item perceptions (IPs) in science term examinations. For Research Question 1, items for grade 7 students aged 12–13 in a Taiwanese secondary school from 2014 to 2019 were analyzed. For Research Question 2, the study included 4,916 students from the said population. For RQ3, perceptions were gathered from 48 students of the same school in 2020. The study’s results showed that first, the average word count of the 611 items was 88.81, with an average stem word count of 41.16, average options word count of 47.66, and stem-to-options word count ratio (S-O ratio) of 1.27. Second, given that the ML M5P categorization algorithm affirms the items’ predictive power, the length of an item is a key factor in determining its difficulty. As a result of this algorithm, 3 categories of the length of science term examination items were classified (<57.5 words, 57.5–91.5 words, and >91.5 words), and 3 linear prediction models of item difficulty (LM1, LM2, and LM3) were generated. From these models, it was found that as the length of an item increases, so does its difficulty. In the prediction analysis of students’ IPs, the J48 prediction result was better and convertible into understandable rules. IP was the root node of the decision rule, indicating the importance of this variable. Therefore, students would more likely answer an item correctly when 1) it was perceived to be easy or normal, 2) the students had high or ordinary learning achievements in science, and 3) it contained less than 71 words. The study’s results can be used as a reference for educators, examiners, and researchers in practical science term examination design. Moreover, it can guide the further research method and direction of applying machine learning in analyzing the difficulty of items in scientific assessments.
Highlights
A good test evaluates students’ learning outcomes and provides teachers with an insight as to whether the defined teaching goals were reached
The results showed that more words in a stem and options increase the item’s difficulty
This study aims to apply machine learning (ML) techniques to analyze the length and difficulty of a test item and students’ item perceptions (IPs) for the development of an item word count classification, item difficulty model, and students’ IP model
Summary
A good test evaluates students’ learning outcomes and provides teachers with an insight as to whether the defined teaching goals were reached. Taiwanese society at large, including school teachers, has recently found that multiple-choice test items in nationwide tests (written in Chinese) have become lengthier and more difficult. Various countries have recently included the development of students’ literacy as a national curriculum goal. The increasing length of items has been attributed to such an assessment of students’ literacy, creating a social-educational issue [2]. The researchers have further studied the items’ length so that students may be able to answer them under a reasonable cognitive load. This improves the performance and reliability of the assessment, which are vital for its effectiveness
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.