A comparison of reliability and precision of subscore reporting methods for a state English language proficiency assessment

Tanya Longabach,Vicki Peyton

doi:10.1177/0265532217689949

Abstract

K–12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the total test scores in today’s accountability-oriented educational environment. Although reporting subscores can provide much-needed information for teachers, administrators, and students about proficiency in the test domains, one of the major drawbacks of subscore reporting includes their lower reliability as compared to the test as a whole. In addition, viewing language domains as if they were not interrelated, and reporting subscores without considering this relationship between domains, may be contradictory to the theory of language acquisition.This study explored several methods of assigning subscores to the four domains of a state English language proficiency test, including classical test theory (CTT)-based number correct, unidimensional item response theory (UIRT), augmented item response theory (A-IRT), and multidimensional item response theory (MIRT), and compared the reliability and precision of these different methods across language domains and grade bands. The first two methods assessed proficiency in the domains separately, without considering the relationship between domains; the last two methods took into consideration relationships between domains. The reliability and precision of the CTT and UIRT methods were similar and lower than those of A-IRT and MIRT for most domains and grade bands; MIRT was found to be the most reliable method. Policy implications and limitations of this study, as well as directions for further research, are discussed.

Full Text