Differential Item Functioning Detection Research Articles

Cognitive screening tests and items have been found to perform differently across groups that differ in terms of education, ethnicity and race. Despite the profound implications that such bias holds for studies in the epidemiology of dementia, little research has been conducted in this area. Using the methods of modern psychometric theory (in addition to those of classical test theory), we examined the performance of the Attention subscale of the Mattis Dementia Rating Scale. Several item response theory models, including the two- and three-parameter dichotomous response logistic model, as well as a polytomous response model were compared. (Log-likelihood ratio tests showed that the three-parameter model was not an improvement over the two-parameter model.) Data were collected as part of the ten-study National Institute on Aging Collaborative investigation of special dementia care in institutional settings. The subscale KR-20 estimate for this sample was 0.92. IRT model-based reliability estimates, provided at several points along the latent attribute, ranged from 0.65 to 0.97; the measure was least precise at the less disabled tail of the distribution. Most items performed in similar fashion across education groups; the item characteristic curves were almost identical, indicating little or no differential item functioning (DIF). However, four items were problematic. One item (digit span backwards) demonstrated a large error term in the confirmatory factor analysis; item-fit chi-square statistics developed using BIMAIN confirm this result for the IRT models. Further, the discrimination parameter for that item was low for all education subgroups. Generally, persons with the highest education had a greater probability of passing the item for most levels of theta. Model-based tests of DIF using MULTILOG identified three other items with significant, albeit small, DIF. One item, for example, showed non-uniform DIF in that at the impaired tail of the latent distribution, persons with higher education had a higher probability of correctly responding to the item than did lower education groups, but at less impaired levels, they had a lower probability of a correct response than did lower education groups. Another method of detection identified this item as having DIF (unsigned area statistic=3.05, p<0.01, and 2.96, p<0.01). On average, across the entire score range, the lower education group's probability of answering the item correctly was 0.11 higher than the higher education group's probability. A cross-validation with larger subgroups confirmed the overall result of little DIF for this measure. The methods used for detecting differential item functioning (which may, in turn, be indicative of bias) were applied to a neuropsychological subtest. These methods have been used previously to examine bias in screening measures across education and ethnic and racial subgroups. In addition to the important epidemiological applications of ensuring that screening measures and neuropsychological tests used in diagnoses are free of bias so that more culture-fair classifications will result, these methods are also useful for the examination of site differences in large multi-site clinical trials. It is recommended that these methods receive wider attention in the medical statistical literature.

Read full abstract

The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness‐of‐fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple‐replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics.

Read full abstract

Differential Item Functioning Detection Research Articles

Related Topics

Articles published on Differential Item Functioning Detection

Evaluating Type I Error and Power Rates Using an Effect Size Measure With the Logistic Regression Procedure for DIF Detection

Item Invariance in Four Subtests of the Universal Nonverbal Intelligence Test (UNIT) across Groups of Deaf and Hearing Children

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures

Detection of Differential Item Functioning on the Kirton Adaption-Innovation Inventory Using Multiple-Group Mean and Covariance Structure Analyses

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

A Comparison of Logistic Regression and Analysis of Variance Differential Item Functioning Detection Methods

An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning

Stepwise Analysis of Differential Item Functioning Based on Multiple‐Group Partial Credit Model

Detection of Differential Item Functioning Under the Graded Response Model With the Likelihood Ratio Test

Using Statistical Procedures to Identify Differentially Functioning Test Items

Identification of Nonuniform Differential Item Functioning: A Comparison of Mantel-Haenszel and Item Response Theory Analysis Procedures

Estimating the Importance of Differential Item Functioning

Identifying Cultural Differences in Items and Traits

Logistic Regression and Its Use in Detecting Differential Item Functioning in Polytomous Items

Simulation Studies of the Effects of Small Sample Size and Studied Item Parameters on SIBTEST and Mantel‐Haenszel Type I Error Performance

WISC-III Verbal Item Invariance across Samples of Deaf and Hearing Children of Similar Measured Ability

A Comparison of Alternative Matching Strategies for DIF Detection in Tests That Are Multidimensional

An Investigation of the Likelihood Ratio Test For Detection of Differential Item Functioning

ESTIMATING THE IMPORTANCE OF DIFFERENTIAL ITEM FUNCTIONING

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Differential Item Functioning Detection Research Articles

Related Topics

Articles published on Differential Item Functioning Detection

Evaluating Type I Error and Power Rates Using an Effect Size Measure With the Logistic Regression Procedure for DIF Detection

Item Invariance in Four Subtests of the Universal Nonverbal Intelligence Test (UNIT) across Groups of Deaf and Hearing Children

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures

Detection of Differential Item Functioning on the Kirton Adaption-Innovation Inventory Using Multiple-Group Mean and Covariance Structure Analyses

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

A Comparison of Logistic Regression and Analysis of Variance Differential Item Functioning Detection Methods

An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning

Stepwise Analysis of Differential Item Functioning Based on Multiple‐Group Partial Credit Model

Detection of Differential Item Functioning Under the Graded Response Model With the Likelihood Ratio Test

Using Statistical Procedures to Identify Differentially Functioning Test Items

Identification of Nonuniform Differential Item Functioning: A Comparison of Mantel-Haenszel and Item Response Theory Analysis Procedures

Estimating the Importance of Differential Item Functioning

Identifying Cultural Differences in Items and Traits

Logistic Regression and Its Use in Detecting Differential Item Functioning in Polytomous Items

Simulation Studies of the Effects of Small Sample Size and Studied Item Parameters on SIBTEST and Mantel‐Haenszel Type I Error Performance

WISC-III Verbal Item Invariance across Samples of Deaf and Hearing Children of Similar Measured Ability

A Comparison of Alternative Matching Strategies for DIF Detection in Tests That Are Multidimensional

An Investigation of the Likelihood Ratio Test For Detection of Differential Item Functioning

ESTIMATING THE IMPORTANCE OF DIFFERENTIAL ITEM FUNCTIONING