State of the psychometric methods: comments on the ISOQOL SIG psychometric papers

Jakob B Bjorner

doi:10.1186/s41687-019-0134-1

Jakob B Bjorner

Open Access

PDF Available

https://doi.org/10.1186/s41687-019-0134-1

Copy DOI

Export

Save

Cite

Journal: Journal of Patient-Reported Outcomes	Publication Date: Jul 30, 2019
Citations: 6	License type: open-access

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundPsychometric analyses of patient reported outcomes typically use either classical test theory (CTT), item response theory (IRT), or Rasch measurement theory (RTM). The three papers from the ISOQOL Psychometrics SIG examined the same data set using the tree different approaches. By comparing the results from these papers, the current paper aims to examine the extent to which conclusions about the validity and reliability of a PRO tool depends on the selected psychometric approach.Main textRegarding the basic statistical model, IRT and RTM are relatively similar but differ notably from CTT. However, modern applications of CTT diminish these differences. In analyses of item discrimination, CTT and IRT gave very similar results, while RTM requires equal discrimination and therefore suggested exclusion of items deviating too much from this requirement. Thus, fewer items fitted the Rasch model. In analyses of item thresholds (difficulty), IRT and RMT provided fairly similar results. Item thresholds are typically not evaluated in CTT. Analyses of local dependence showed only moderate agreement between methods, partly due to different thresholds for important local dependence. Analyses of differential item function (DIF) showed good agreement between IRT and RMT. Agreement might be further improved by adjusting the thresholds for important DIF. Analyses of measurement precision across the score range showed high agreement between IRT and RMT methods. CTT assumes constant measurement precision throughout the score range and thus gave different results. Category orderings were examined in RMT analyses by checking for reversed thresholds. However, this approach is controversial within the RMT society. The same issue can be examined by the nominal categories IRT model.ConclusionsWhile there are well-known differences between CTT, IRT and RMT, the comparison between three actual analyses revealed a great deal of agreement between the results from the methods. If the undogmatic attitude of the three current papers is maintained, the field will be well served.

Highlights

Regarding the basic statistical model, item response theory (IRT) and RTM are relatively similar but differ notably from classical test theory (CTT)
Item and test information functions are usually calculated in an IRT analyses ([2] Figs. 4 and 5), while RTM analysis usually rely on person-item threshold maps ([3] Fig. 1)
As stated in Donald Patrick’s excellent introduction [18], the authors of the three papers in this series are to be commended for their transparency and rigor in tackling what has been a sometimes contentious debate about different strengths and weaknesses of CTT, IRT, and Rasch analysis

Summary

Main text

Categorical data factor analysis using polychoric correlations is equivalent to fitting a so-called normal-ogive IRT model [6] In this model the probability of answering in category c or higher is fitted using the cumulative normal distribution: Pðxij ≥ cÞ 1⁄4 R−α∞iðθ j−βciÞ φðtÞdt , where xij is the response of person j on item i, θj is the latent depression score for person j, αi is a discrimination parameter for item i, and βci is a threshold parameter for category c of item i. From the IRT perspective, an item that fits the Rasch model has no problems with regards to response category ordering, regardless of the ordering of thresholds This is an example of an issue, where the measurement philosophy differs between the two traditions. For all researchers interested in the issue of disordered categories, I strongly recommend the thorough analysis by García-Pérez [13]

Conclusions

Background

Findings