Abstract

Four item response theory (IRT) models were compared using data from tests where multiple items were grouped into testlets focused on a common stimulus. In the bi‐factor model each item was treated as a function of a primary trait plus a nuisance trait due to the testlet; in the testlet‐effects model the slopes in the direction of the testlet traits were constrained within each testlet to be proportional to the slope in the direction of the primary trait; in the polytomous model the item scores were summed into a single score for each testlet; and in the independent‐items model the testlet structure was ignored. Using the simulated data, reliability was overestimated somewhat by the independent‐items model when the items were not independent within testlets. Under these nonindependent conditions, the independent‐items model also yielded greater root mean square error (RMSE) for item difficulty and underestimated the item slopes. When the items within testlets were instead generated to be independent, the bi‐factor model yielded somewhat higher RMSE in difficulty and slope. Similar differences between the models were illustrated with real data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.