Abstract

When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible to human errors. A key finding from behavioral decision-making research has shown that a parametric model of human decision making often outperforms the decision maker himself. We exploit this finding by seeking a model to mimic how analysts integrate FT item level statistics and graphical performance plots to predict the analyst's assignment of the item's status. The procedure suggests a set of rules that achieves a desired level of classification accuracy, separating situations in which the evidence supports firm decisions from those situations that would likely benefit from HRs. Implementation of the decision rules accounts for an estimated 65% reduction in calibrations requiring HRs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call