Abstract

Recently, large-scale testing programs have an increasing interest in providing examinees with more accurate diagnostic information by reporting overall and domain scores simultaneously. However, there are few studies focusing on how to report and interpret reliable total scores and domain scores based on bi-factor models. In this study, the authors introduced six methods of reporting overall and domain scores as weighted composite scores of the general and specific factors in a bi-factor model, and compared their performance with Yao's MIRT (multidimensional item response theory) method using both simulated and empirical data. In the simulation study, four factors were considered: test length, number of dimensions, correlation between dimensions, and sample size. Major findings are that Bifactor-M4 and Bifactor-M6, the methods utilizing discrimination parameters of the specific dimensions to compute the weights, provided the most accurate and reliable overall and domain scores in most conditions, especially when the test was long, the correlation between dimensions was high and the number of dimensions was large; additionally, Bifactor-M4 recovered the relationship of true ability parameters the best of all the proposed methods; On the contrary, Bifactor-M2, the method with equal weights, performed poor on the overall score estimation; Bifactor-M3 and Bifactor-M5, the methods where weights were computed using the discrimination parameters of all the dimensions, performed poor on the domain score estimation; Bifactor-M1, the original factor method, obtained the worst estimations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call