Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

Véronique Sébille,Tanguy Le Néel,Gildas Kubis,Jean-Benoit Hardouin,Francis Guillemin,François Boyer,Bruno Falissard

doi:10.1186/1471-2288-10-24

Abstract

BackgroundPatients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared.MethodsTwo-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified.ResultsWhen person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods.ConclusionWithout any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.

Highlights

Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research
classical test theory (CTT) relies on the observed scores that are assumed to provide a good representation of a “true” score, while Item Response Theory (IRT) relies on an underlying response model relating the items responses to a latent parameter, often called latent trait, interpreted as the true individual Quality of Life (QoL), for instance
Simulation study Situation 1 The power achieved by the tests of group effects using IRT modelling (Rasch model) with fixed μIRT1 and δj parameters (j = 1, ..., J) with different levels of precision for the latter as compared with their simulated values are given in additional file 1 for different values of the effect sizes on the latent trait effect size on the latent trait scale (ESIRT), sample sizes per group N, and number of items J

Summary

Introduction

Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). CTT relies on the observed scores (possibly weighted sum of patients items’ responses) that are assumed to provide a good representation of a “true” score, while IRT relies on an underlying response model relating the items responses to a latent parameter, often called latent trait, interpreted as the true individual QoL, for instance. Such IRT models take into account some items parameters

Objectives

Methods

Results

Discussion

Conclusion