Abstract
Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several person fit indices were developed to detect aberrant responding on educational and psychological tests. The majority of the person fit literature has been focused on creating and evaluating new indices. The aim of this study was to assess the effect of aberrant responding on the accuracy of estimated item parameters and refining estimations by using person fit statistics by means of simulation. Our results showed that the presence of aberrant response patterns created bias in the both b and a parameters at the item level and affected the classification of students, particularly high-performing students, into performance categories regardless of whether aberrant response patterns were present in the data or were removed. The results differed by test length and the percentage of students with aberrant response patterns. Practical and theoretical implications are discussed.
Highlights
IntroductionThe use of large-scale tests in educational, psychological, and decision-making contexts has become part of the ongoing activities of many school districts, provinces/states, and countries.Often, important decisions regarding accountability and placement of students in performance categories (e.g., below basic, basic, proficient, excellent) are made on the basis of test scores generated from tests
The use of large-scale tests in educational, psychological, and decision-making contexts has become part of the ongoing activities of many school districts, provinces/states, and countries.Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests
It is important to evaluate the validity of the inferences derived from test results, which depends on the measurement model used in the design, construction of items, scoring of the students’ responses, and analyses of the scored responses
Summary
The use of large-scale tests in educational, psychological, and decision-making contexts has become part of the ongoing activities of many school districts, provinces/states, and countries.Often, important decisions regarding accountability and placement of students in performance categories (e.g., below basic, basic, proficient, excellent) are made on the basis of test scores generated from tests. It is important to evaluate the validity of the inferences derived from test results, which depends on the measurement model used in the design, construction of items, scoring of the students’ responses, and analyses of the scored responses. When the measurement model fails to accurately reflect the real aspects of student responses, the validity of test scores may be compromised. One example of this failure can be found when unusual or unexpected response patterns are produced by some students. If some students produce correct answers to the more difficult items but fail to answer the easier items successfully, the students’ responses are considered “unexpected”, “aberrant”, “unpredictable”, or “misfitting” [1]. Meijer (1997) and Schmitt, Cortina, and Whitney (1993) suggested that validity and reliability
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have