Abstract
Progress tests are longitudinal assessments of students' knowledge based on successive tests. Calibration of the test difficulty is challenging, especially because of the tendency of item-writers to overestimate students' performance. The relationships between the levels of Bloom's taxonomy, the ability of test judges to predict the difficulty of test items and the real psychometric properties of test items have been insufficiently studied. To investigate the psychometric properties of items according to their classification in Bloom's taxonomy and judges' estimates, through an adaptation of the Angoff method. Prospective observational study using secondary data from students' performance in a progress test applied to ten medical schools, mainly in the state of São Paulo, Brazil. We compared the expected and real difficulty of items used in a progress test. The items were classified according to Bloom's taxonomy. Psychometric properties were assessed based on their taxonomy and fields of knowledge. There was a 54% match between the panel of experts' expectations and the real difficulty of items. Items that were expected to be easy had mean difficulty that was significantly lower than that of items that were expected to be medium (P < 0.05) or difficult (P < 0.01). Items with high-level taxonomy had higher discrimination indices than low-level items (P = 0.026). We did not find any significant differences between the fields in terms of difficulty and discrimination. Our study demonstrated that items with high-level taxonomy performed better in discrimination indices and that a panel of experts may develop coherent reasoning regarding the difficulty of items.
Highlights
Assembling a knowledge test can be a challenging task, especially with regard to calibrating the difficulty of the test
We investigated the relationships between the exam judges’ estimates and the classification of the difficulty and discrimination levels of items, using Bloom’s taxonomy in a progress test setting
A total of 4,596 students participated in the test (94.1% of the total population), from which 4,563 were included in the general psychometric analysis
Summary
Assembling a knowledge test can be a challenging task, especially with regard to calibrating the difficulty of the test. Many studies have addressed how useful experts’ opinions can be, their predictions of the difficulty is often different from what the students perceive This uncertainty relates to the multiple factors involved in the cognitive process that is necessary for answering a question and to the tendency of item-writers to overestimate students’ performance.[1,2] Questions can require lower or higher levels of cognitive processing, depending on whether students have to recall, minimally understand or apply their knowledge. The relationships between the levels of Bloom’s taxonomy, the ability of test judges to predict the difficulty of test items and the real psychometric properties of test items have been insufficiently studied. CONCLUSIONS: Our study demonstrated that items with high-level taxonomy performed better in discrimination indices and that a panel of experts may develop coherent reasoning regarding the difficulty of items
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.