Three-parameter Logistic Item Response Theory Research Articles

When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel—Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item— total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley’s linear regressed true score estimate.

Read full abstract

Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel method of differential item functioning (DIF) analysis in computerized adaptive tests (CATs). Each simulated examinee received 25 items from a 75-item pool. A three-parameter logistic item response theory (IRT) model was assumed, and examinees were matched on expected true scores based on their CAT responses and estimated item parameters. The CAT-based DIF statistics were found to be highly correlated with DIF statistics based on nonadaptive administration of all 75 pool items and with the true magnitudes of DIF in the simulation. Average DIF statistics and average standard errors also were examined for items with various characteristics. Finally, a study was conducted of the accuracy with which the modified Mantel-Haenszel procedure could identify CAT items with substantial DIF using a classification system now implemented by some testing programs. These additional analyses provided further evidence that the CAT-based DIF procedures performed well. More generally, the results supported the use of IRT-based matching variables in DIF analysis. Index terms: adaptive testing, computerized adaptive testing, differential item functioning, item bias, item response theory.

Read full abstract

Three-parameter Logistic Item Response Theory Research Articles

Articles published on Three-parameter Logistic Item Response Theory

On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty.

Estimating standard errors of IRT true score equating coefficients using imputed item parameters

Testing the developmental theory of sex differences in intelligence using latent modeling: Evidence from the TEA Ability Battery (BAT-7)

Using the Stan Program for Bayesian Item Response Theory.

Comment on 3PL IRT Adjustment for Guessing

Alternative Matching Scores to Control Type I Error of the Mantel–Haenszel Procedure for DIF in Dichotomously Scored Items Conforming to 3PL IRT and Nonparametric 4PBCB Models

Estimating Consistency and Accuracy Indices for Multiple Classifications

Discussion

Optimal Item Discrimination and Maximum Information for Logistic IRT Models

Some New Item Selection Criteria for Adaptive Testing

A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests

Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Three-parameter Logistic Item Response Theory Research Articles

Articles published on Three-parameter Logistic Item Response Theory

On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty.

Estimating standard errors of IRT true score equating coefficients using imputed item parameters

Testing the developmental theory of sex differences in intelligence using latent modeling: Evidence from the TEA Ability Battery (BAT-7)

Using the Stan Program for Bayesian Item Response Theory.

Comment on 3PL IRT Adjustment for Guessing

Alternative Matching Scores to Control Type I Error of the Mantel–Haenszel Procedure for DIF in Dichotomously Scored Items Conforming to 3PL IRT and Nonparametric 4PBCB Models

Estimating Consistency and Accuracy Indices for Multiple Classifications

Discussion

Optimal Item Discrimination and Maximum Information for Logistic IRT Models

Some New Item Selection Criteria for Adaptive Testing

A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests

Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability