When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel—Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item— total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley’s linear regressed true score estimate.