Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF

Mark D Shermis,Liyang Mao,Matthew Mulholland,Vincent Kieftenbeld

doi:10.1080/15305058.2017.1308949

Abstract

This study uses the feature sets employed by two automated scoring engines to determine if a “linguistic profile” could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information was collected on gender and socioeconomic status (SES). Textual features were extracted and analyzed using Differential Item Functioning Analysis System (DIFAS) and the Mantel-Haenszel chi-square, the Liu-Agresti cumulative common log-odds ratio, and Cox's noncentrality parameter estimator to assess the probability that the focal groups (e.g., females) differed significantly from the reference groups (e.g., males). Two of the 14 items were flagged for possible DIF on gender and four were flagged for possible DIF on SES. The responses from the focal and reference groups were then analyzed using nine machine learning algorithms to determine if there were significant differences on specific linguistic features. For gender, the procedure did not reliably identify items manifesting DIF. The results for SES were more encouraging with a reliable prediction of the presence of DIF, but not its direction.

Full Text