Abstract

The present study used the two-level testlet response model (MMMT-2) to assess impact, differential item functioning (DIF), and differential testlet functioning (DTLF) in a reading comprehension test. The data came from 21,641 applicants into English Masters’ programs at Iranian state universities. Testlet effects were estimated, and items and testlets that were functioning differentially for test takers of different genders and majors were identified. Also parameter estimates obtained under MMMT-2 and those obtained under the two-level hierarchical generalized linear model (HGLM-2) were compared. The results indicated that ability estimates obtained under the two models were significantly different at the lower and upper ends of the ability distribution. In addition, it was found that ignoring local item dependence (LID) would result in overestimation of the precision of the ability estimates. As for the difficulty of the items, the estimates obtained under the two models were almost the same, but standard errors were significantly different.

Highlights

  • Item measures may be affected by person grouping factors such as gender, L1 background, and ethnic background, among others, as well as by item grouping factors such as common input or stimulus, common response format, and item chaining, to name but a few

  • The effect of person grouping factors can be studied through impact and differential item functioning (DIF) analysis, and the effect of item grouping factors can be captured by studying testlet effect

  • The present study investigated testlet effects within the context of the reading comprehension section of the University Entrance Examination (UEE) for MA applicants into English programs at Iranian state universities

Read more

Summary

Introduction

Item measures may be affected by person grouping factors such as gender, L1 background, and ethnic background, among others, as well as by item grouping factors such as common input or stimulus, common response format, and item chaining, to name but a few. SAGE Open address LID: (a) Testlet data have been fitted to score-based polytomous IRT models such as the graded response model (Samejima, 1969), polytomous logistic regression (Zumbo, 1999), or polytomous SIBTEST (Penfield & Lam, 2000). In these polytomous item response models, each testlet with m questions is treated as an item with the total score of the items within each testlet ranging from 0 to m. Score-based approaches have been criticized on the following grounds: (a) They do not take into account the exact response patterns of test takers to individual items within a testlet; a lot of information would be lost (Eckes, in press); and (b) applying polytomous IRT models to capture testlet effect has been reported to lead to biased parameter estimates and substantial overestimation of reliability and test information values (Thissen, Steinberg, & Mooney, 1989; Wainer, 1995)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.