Abstract

illustrated by the fine collection of articles included in this special issue of Medical Care, a variety of statistical methods exist for the detection of measurement or differential item functioning (DIF). These methods range from those based on parametric latent variable models to methods that do not explicitly depend on latent variable assumptions and that require no model fitting. This diversity presents some difficulties for researchers who seek to use these methods. Which of the available methods should be used? Will the different methods lead to similar conclusions? What are the strengths and weaknesses of the available methods? This brief review addresses these questions in relation to the methods illustrated in the articles included in this special issue. As described subsequently, the term bias is used throughout to refer to systematic inaccuracy in measurement. We begin by considering some general issues in detection that are relevant to all detection methods. All of the detection methods of interest here are concerned with systematic (nonrandom) group differences in scale or item scores with groups being defined often by demographic variables (eg, language status) or some other measured characteristic. A central feature of all current detection methods is the principle: systematic group differences in scores on a scale or item are considered evidence of measurement only if group differences in scores remain among individuals who are all matched on the construct or latent variable being measured by the scale or item.1'2 In the absence of any matching, group differences in scale or item scores may or may not indicate bias. Among individuals who are matched on the construct being measured, an unbiased scale or item should yield scores that do not differ across groups apart from random variation. Systematic group differences in scale or item scores that remain after matching are taken as evidence of bias. The matching principle has one glaring weakness: no explicit matching on the construct underlying the scale or item is generally possible because no perfect measure of this construct is available. The various detection methods can be viewed as different approaches to resolving this dilemma. Two broad categories of detection methods exist.2 The first category uses an observed, imperfect measure of the construct to achieve the desired matching. These observed score methods typically choose the sum of the item scores for the scale under study as the measure used for matching. In principle, other observed measures could be used that are external to the scale under study. This option is seldom adopted in practice and could be more widely considered. The exclusive use of external matching variables can lead to spurious findings of bias, however.2 Examples of observed score methods include the Mantel-Haenszel (MH) procedure,3 standardization procedures,4 and logistic regression.5 The second category of detection procedures use latent variable models to achieve the desired matching. These latent variable methods evaluate by examining whether the same latent variable model holds for the scale or item in each group. If the same model holds in each group, we expect that 2 individuals with the same score on the latent variable, but who are from different groups, should achieve the same score on the scale or item apart from random error. Here the matching is implicit, because it is ordinarily not necessary to actually produce the latent variable scores for this purpose. Latent variable methods typically adopt either the common factor model as the basis for the analysis or one

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.