Abstract

Item response theory (IRT) model applications extend well beyond cognitive ability testing, and various patient-reported outcomes (PRO) measures are among the more prominent examples. PRO (and like) constructs differ from cognitive ability constructs in many ways, and these differences have model fitting implications. With a few notable exceptions, however, most IRT applications to PRO constructs rely on traditional IRT models, such as the graded response model. We review some notable differences between cognitive and PRO constructs and how these differences can present challenges for traditional IRT model applications. We then apply two models (the traditional graded response model and an alternative log-logistic model) to depression measure data drawn from the Patient-Reported Outcomes Measurement Information System project. We do not claim that one model is “a better fit” or more “valid” than the other; rather, we show that the log-logistic model may be more consistent with the construct of depression as a unipolar phenomenon. Clearly, the graded response and log-logistic models can lead to different conclusions about the psychometrics of an instrument and the scaling of individual differences. We underscore, too, that, in general, explorations of which model may be more appropriate cannot be decided only by fit index comparisons; these decisions may require the integration of psychometrics with theory and research findings on the construct of interest.

Highlights

  • Item response theory (IRT) models were developed to solve practical testing problems in large-scale, multiple-choice, cognitive aptitude testing

  • Alternative models for handling non-normality have been proposed, none is applied routinely, and we argue that the problems in fitting IRT models to patient-reported outcomes (PRO) constructs due to latent trait distributional issues extend well beyond adjusting the parameter estimates for “non-normality,” as we describe

  • The parameters of the graded response model (GRM) and the LL model are nonlinear transformations of each other,6 and they imply response propensities that are equal and produce correlation matrices that are equivalent, they can lead to very different interpretations of the psychometric properties

Read more

Summary

Introduction

Item response theory (IRT) models were developed to solve practical testing problems in large-scale, multiple-choice, cognitive aptitude testing (see Lord, 1980). IRT models and the psychometric procedures derived from them have brought about revolutionary changes in how cognitive ability tests are analyzed, developed, administered, and scored. Common applications include: (a) linking methods to place test scores from different item sets onto the same scale (Lee & Lee, 2018), (b) statistical approaches for detecting differential item functioning to identify items that may be inappropriate for some examinee populations (Millsap, 2012), and (c) computerized adaptive testing methods for achieving precise scores with as few items as possible (Magis, Yan, & von Davier, 2017). Over the past two decades, applications of parametric IRT models have been extended beyond multiple-choice ability testing to, for example, the domains of political science (Treier & Jackman, 2008), sociology (Osgood, McMorris, & Potenza, 2002), personality

IRT model assumptions
Standard IRT Models and PRO Constructs
Alternative IRT-Related Models
Present Research
Psychometrics
The Graded Response Model
LL IRT Model
Equivalence
Item Response Curves
Latent Trait Scores
Discussion
Findings
Basic Differences Between the GRM and the LL Model
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call