Abstract

ObjectiveThe aims were to review practices concerning Differential Item Functioning (DIF) detection in composite measurement scales, particularly those used in health research, and to provide guidance on how to proceed if statistically significant DIF is detected.MethodsThis work specifically addressed the Rasch model which is the subject of growing interest in the field of health owing to its particularly advantageous properties. There were three steps: 1) Literature review to describe current practices; 2) Simulation study to determine under which conditions encountered in health research studies can erroneous conclusions be drawn from group comparisons when a scale is affected by DIF but which is not considered; 3) Based on steps 1 and 2, formulation of recommendations that were subsequently reviewed by leading internationally recognized experts.ResultsFour key recommendations were formulated to help researchers to determine whether statistically significant DIF is meaningful in practice, according to the kind of DIF (uniform or non-uniform) and the DIF effect size.ConclusionThis work provides the first recommendations on how to deal in practice with the presence of DIF in composite measurement scales used in health research studies.

Highlights

  • IntroductionOther than some purely descriptive studies, almost all health research studies include group comparisons: typical study designs involve a primary outcome measured in every subject, whose occurrence (if categorical) or mean (if continuous) is compared between groups defined by a characteristic or exposure of interest

  • Other than some purely descriptive studies, almost all health research studies include group comparisons: typical study designs involve a primary outcome measured in every subject, whose occurrence or mean is compared between groups defined by a characteristic or exposure of interest

  • In 156 (55%) articles, measurement invariance was assessed within the confirmatory factor analysis (CFA) framework only, in 98 (34%) it was assessed within the item response theory (IRT) framework only, and in 19 (7%) articles only observed variable methods were used (12 logistic regression, 4 Mantel-Haenszel, 3 others)

Read more

Summary

Introduction

Other than some purely descriptive studies, almost all health research studies include group comparisons: typical study designs involve a primary outcome measured in every subject, whose occurrence (if categorical) or mean (if continuous) is compared between groups defined by a characteristic or exposure of interest. Composite measurement scales are used in epidemiological studies to measure complex health related constructs, including quality of life, depression and satisfaction with care. If the regression coefficient related to the group membership is statistically significant, the probability of responding positively to this item is significantly different between the groups whatever the level on the measured construct. This phenomenon is termed differential item functioning (DIF): the item “functions” differently in the groups to be compared [8,10]. If DIF, uniform or non-uniform, is present in a scale, measurement invariance does not hold and group comparisons may be inaccurate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call