International application of PROMIS computerized adaptive tests: US versus country-specific item parameters can be consequential for individual patient scores

Caroline B Terwee,Martine H.P Crins,Leo D Roorda,Karon F Cook,David Cella,Niels Smits,Benjamin D Schalet

doi:10.1016/j.jclinepi.2021.01.011

Abstract

ObjectivePROMIS offers computerized adaptive tests (CAT) of patient-reported outcomes, using a single set of US-based IRT item parameters across populations and language-versions. The use of country-specific item parameters has local appeal, but also disadvantages. We illustrate the effects of choosing US or country-specific item parameters on PROMIS CAT T-scores. Study design and settingSimulations were performed on response data from Dutch chronic pain patients (n = 1110) who completed the PROMIS Pain Behavior item bank. We compared CAT T-scores obtained with (1) US parameters; (2) Dutch item parameters; (3) US item parameters for DIF-free items and Dutch item parameters (rescaled to the US metric) for DIF items; (4) Dutch item parameters for all items (rescaled to the US metric). ResultsWithout anchoring to a common metric, CAT T-scores cannot be compared. When scores were rescaled to the US metric, mean differences in CAT T-scores based on US vs. Dutch item parameters were negligible. However, 0.9%–4.3% of the T-score differences were larger than 5 points (0.5 SD). ConclusionThe choice of item parameters can be consequential for individual patient scores. We recommend more studies of translated CATs to examine if strategies that allow for country-specific item parameters should be further investigated.

Highlights

Item response theory (IRT) is increasingly used to create item banks as the basis for computerized adaptive testing (CAT) for measuring patient-reported outcomes (PROs) [1,2,3,4,5]
When scores were rescaled to the US metric, mean differences in CAT T-scores based on US vs. Dutch item parameters were negligible
0.9%–4.3% of the T-score differences were larger than 5 points (0.5 standard deviation (SD))

Summary

Introduction

Item response theory (IRT) is increasingly used to create item banks as the basis for computerized adaptive testing (CAT) for measuring patient-reported outcomes (PROs) [1,2,3,4,5]. The Patient-Reported Outcomes Measurement Information System (PROMIS) is the largest system of PRO item banks administered as CATs [9,10,11,12]. The default PROMIS convention is to use a single set of IRT item parameters across populations and language-versions to express scores on a common scale (T-score metric), unless evidence shows that this is problematic, eg, if items function substantially different across populations or language-versions [9,13]. A method adopted from the equating and linking literature, called Stocking-Lord method, was used for this purpose [25,26,27]

Methods

Results

Discussion

Conclusion