Comparing five depression measures in depressed Chinese patients using item response theory: an examination of item properties, measurement precision and score comparability

Yue Zhao,Wai Chan,Barbara Chuen Yee Lo

doi:10.1186/s12955-017-0631-y

Abstract

BackgroundItem response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context.MethodsA clinical sample of 207 Hong Kong Chinese outpatients was recruited. Data analyses were performed including classical item analysis, IRT concurrent calibration and IRT true score equating. The IRT assumptions of unidimensionality and local independence were tested respectively using confirmatory factor analysis and chi-square statistics. The IRT linking assumptions of construct similarity, equity and subgroup invariance were also tested. The graded response model was applied to concurrently calibrate all five depression measures in a single IRT run, resulting in the item parameter estimates of these measures being placed onto a single common metric. IRT true score equating was implemented to perform the outcome score linking and construct score concordances so as to link scores from one measure to corresponding scores on another measure for direct comparability.ResultsFindings suggested that (a) symptoms on depressed mood, suicidality and feeling of worthlessness served as the strongest discriminating indicators, and symptoms concerning suicidality, changes in appetite, depressed mood, feeling of worthlessness and psychomotor agitation or retardation reflected high levels of severity in the clinical sample. (b) The five depression measures contributed to various degrees of measurement precision at varied levels of depression. (c) After outcome score linking was performed across the five measures, the cut-off scores led to either consistent or discrepant diagnoses for depression.ConclusionsThe study provides additional evidence regarding the psychometric properties and clinical utility of the five depression measures, offers methodological contributions to the appropriate use of IRT in PRO measures, and helps elucidate cultural variation in depressive symptomatology. The approach of concurrently calibrating and linking multiple PRO measures can be applied to the assessment of PROs other than the depression context.

Highlights

Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures
PRO measures have great potential to be integrated into healthcare practice and substantially contribute to elucidating the properties of symptoms directly reported by patients
IRT assumption checking For each depression measure and the combined item set, the ratio of the first to the second eigenvalues considerably exceeded 4

Summary

Introduction

Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context. In a commissioned paper by the U.S National Quality Forum on the issues to consider when evaluating PROs as candidate performance measures in healthcare settings, Cella et al [1] remarked on several methodological issues related to the use of PROs in patient-centered outcome research. PRO measures have great potential to be integrated into healthcare practice and substantially contribute to elucidating the properties of symptoms directly reported by patients (see for example [2])

Objectives

Methods

Results

Discussion

Conclusion