Difficulty Of Test Items Research Articles

The level of difficulty of mathematical test items is a critical aspect for evaluating test quality and educational outcomes. Accurately predicting item difficulty during test creation is thus significantly important for producing effective test papers. This study used more than ten years of content and score data from China’s Henan Provincial College Entrance Examination in Mathematics as an evaluation criterion for test difficulty, and all data were obtained from the Henan Provincial Department of Education. Based on the framework established by the National Center for Education Statistics (NCES) for test item assessment methodology, this paper proposes a new framework containing eight features considering the uniqueness of mathematics. Next, this paper proposes an XGBoost-based SHAP model for analyzing the difficulty of mathematics tests. By coupling the XGBoost method with the SHAP method, the model not only evaluates the difficulty of mathematics tests but also analyzes the contribution of specific features to item difficulty, thereby increasing transparency and mitigating the “black box” nature of machine learning models. The model has a high prediction accuracy of 0.99 for the training set and 0.806 for the test set. With the model, we found that parameter-level features and reasoning-level features are significant factors influencing the difficulty of subjective items in the exam. In addition, we divided senior secondary mathematics knowledge into nine units based on Chinese curriculum standards and found significant differences in the distribution of the eight features across these different knowledge units, which can help teachers place different emphasis on different units during the teaching process. In summary, our proposed approach significantly improves the accuracy of item difficulty prediction, which is crucial for intelligent educational applications such as knowledge tracking, automatic test item generation, and intelligent paper generation. These results provide tools that are better aligned with and responsive to students’ learning needs, thus effectively informing educational practice.

Read full abstract

Purpose: Per the medical education research literature, there remains no gold standard for determining medical education exam cut scores (passing scores). 1 Most U.S. medical schools use an absolute or relative standard method; each has advantages and limitations. Given the diverse goals schools strive to accomplish when administering exams, neither approach is sufficient. For example, in Phase 1 of our basic science curriculum, we endeavor to develop exams that are valid, reliable, appropriately rigorous, and criterion-referenced; prepare students for the high-stakes testing situations they will encounter in their medical education and graduate medical education programs and State Board exams; and provide a valid means to adjust cut scores due to cohort performance and item difficulty. A combination cut score method allows schools to accomplish such goals in a manner that is also easy to explain and defend to students and faculty. This presentation will discuss the advantages, process, and outcomes of using a combination cut score method for basic science assessment compared with absolute and relative standard setting methods. Approach: Per Cohen-Schotanus and van der Vleuten’s study, 2 we employed a combination cut score method, which uses the best performing students as point of reference. Our class size is 60 students per cohort, and our Phase 1 basic science curriculum is delivered via 14 modules. Faculty co-leaders develop module exams via NBME’s Customized Assessment Services. Specifically, for each exam, the cut score is 65% of the 95th percentile student’s score. To date, we have completed 5 module exams with promising psychometric outcomes. Results: Thus far, we have found that our outcomes are comparable to previous years’ while being more manageable and slightly more consistent. Given that we pivoted to administering NBME exams virtually during the pandemic, managing exam/retake creation and online proctoring consumes considerable time and resources. These more predictable and consistent outcomes of the combination method have aided in our ability to manage our pandemic contingency plans effectively. Specifically, our exam outcomes include more consistent exam cut scores and number of student failures. Across the 14 exams last year, the cut scores ranged from 56 to 65. Across the 5 exams administered thus far this year, the range of cut scores is 57 to 65. More significantly, last year’s number of failures ranged from 0 to 5, and this year the number of failures range from 1 to 3. As predicted when the combination cut score method was proposed, the lower ranges of cut scores and increased consistency in the number of student failures are positive outcomes that were expected. Discussion: Thus far, we have learned that the combination cut score method is producing outcomes as expected with fewer surprises. Students report appreciation for the straightforward nature of the cut score method and our transparency in reporting it at the beginning of the course. The use of consistent cut scores has allayed student and faculty concerns regarding exam validity and reliability, and the more consistent number of failures has allowed our assessment team to support the exam retake process more efficiently and with fewer human resources. Significance: As medical schools continue to grapple with exam cut score decisions, a combination cut score method offers several advantages. It supports criterion-referenced exams that measure student attainment of objectives while also considering cohort abilities and test item difficulty. It is simple and inexpensive to employ, transparent and easy to understand, yields consistent and manageable results, and is based upon sound research practices.

Read full abstract

Difficulty Of Test Items Research Articles

Related Topics

Articles published on Difficulty Of Test Items

Development of Computational Mathematic Thinking Test Instruments Based On Computered Based Test

Dimensionality, discrimination power and difficulty of English test items: the case of graduate exam for healthcare applicants

Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Students with Aberrant Responses in TIMSS2019 Mathematics Test in Sultanate of Oman: Comparison of Between Squared Residual Index by Content, Cognitive Process, Item Type, and Difficulty.

Implementation of the Problem-Based Learning Model in Teaching of Mathematics about Adding Fractions with Different Denominators in Elementary Schools

Health-Related Quality of Life in Patients With Ventricular Assist Device: Psychometric Evaluation of the German Version of the Quality of Life With a Ventricular Assist Device Questionnaire.

Decomposing Difficulty of Reading Literacy Test Items

The Analysis of Chemistry Test Item Difficulty on XI MIPA Year-End Assessment at SMAN 8 Pontianak

The Influence of the Cooperative Learning Model Type Numbered Heads Together (NHT) on the Mathematical Creative Thinking Ability of Students in Class VIII of Trisakti Private Junior High School, Pematang Siantar

The Effect of Using Problem Based Learning Model (PBL) on Mathematics Learning Outcomes of Class VIII Students at SMP Negeri 2 Tapian Dolok

It's easier to relearn gross motor skills than learn them for the first time after injury: Empirical evidence informing the age at injury debate

Estimating the Psychometric Properties (Item Difficulty, Discrimination and Reliability Indices) of Test Items using Kuder-Richardson Approach (KR-20)

THE QUALITY OF SUMMATIVE TEST MADE BY EFL TEACHER

The Influence of Problem Solving Learning Model on Students' Mathematic Problem Solving Ability in SMA Private High School, Nommensen Pematangsiantar Campus

The Influence of Think Pair Share (TPS) Cooperative Learning Model on Students' Mathematical Communication Ability In Class VIII SMP Negeri 12 Pematangsiantar

Challenging to Read, Easy to Comprehend? Effects of Linguistic Demands on Secondary Students’ Text Comprehension in Physics

Pengembangan Model Waktu Respon Untuk Memahami Proses Kognitif Peserta Tes

Role of Figures in Mathematics Problems in Slovak Testing T9

Advantages of a Combination Cut Score Method for Basic Science Assessment.

THE ABILITY OF THE ELEVENTH GRADE STUDENTS OF SMA DELI MURNI BANDAR BARU ON USING COUNTABLE ANDUNCOUNTABLE NOUNS IN THE ACADEMIC YEAR OF 2020/2021

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Difficulty Of Test Items Research Articles

Related Topics

Articles published on Difficulty Of Test Items

Development of Computational Mathematic Thinking Test Instruments Based On Computered Based Test

Dimensionality, discrimination power and difficulty of English test items: the case of graduate exam for healthcare applicants

Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Students with Aberrant Responses in TIMSS2019 Mathematics Test in Sultanate of Oman: Comparison of Between Squared Residual Index by Content, Cognitive Process, Item Type, and Difficulty.

Implementation of the Problem-Based Learning Model in Teaching of Mathematics about Adding Fractions with Different Denominators in Elementary Schools

Health-Related Quality of Life in Patients With Ventricular Assist Device: Psychometric Evaluation of the German Version of the Quality of Life With a Ventricular Assist Device Questionnaire.

Decomposing Difficulty of Reading Literacy Test Items

The Analysis of Chemistry Test Item Difficulty on XI MIPA Year-End Assessment at SMAN 8 Pontianak

The Influence of the Cooperative Learning Model Type Numbered Heads Together (NHT) on the Mathematical Creative Thinking Ability of Students in Class VIII of Trisakti Private Junior High School, Pematang Siantar

The Effect of Using Problem Based Learning Model (PBL) on Mathematics Learning Outcomes of Class VIII Students at SMP Negeri 2 Tapian Dolok

It's easier to relearn gross motor skills than learn them for the first time after injury: Empirical evidence informing the age at injury debate

Estimating the Psychometric Properties (Item Difficulty, Discrimination and Reliability Indices) of Test Items using Kuder-Richardson Approach (KR-20)

THE QUALITY OF SUMMATIVE TEST MADE BY EFL TEACHER

The Influence of Problem Solving Learning Model on Students' Mathematic Problem Solving Ability in SMA Private High School, Nommensen Pematangsiantar Campus

The Influence of Think Pair Share (TPS) Cooperative Learning Model on Students' Mathematical Communication Ability In Class VIII SMP Negeri 12 Pematangsiantar

Challenging to Read, Easy to Comprehend? Effects of Linguistic Demands on Secondary Students’ Text Comprehension in Physics

Pengembangan Model Waktu Respon Untuk Memahami Proses Kognitif Peserta Tes

Role of Figures in Mathematics Problems in Slovak Testing T9

Advantages of a Combination Cut Score Method for Basic Science Assessment.

THE ABILITY OF THE ELEVENTH GRADE STUDENTS OF SMA DELI MURNI BANDAR BARU ON USING COUNTABLE ANDUNCOUNTABLE NOUNS IN THE ACADEMIC YEAR OF 2020/2021