Abstract
The <b>R</b> package <b>plink</b> has been developed to facilitate the linking of mixed-format tests for multiple groups under a common item design using unidimensional and multidimensional IRT-based methods. This paper presents the capabilities of the package in the context of the unidimensional methods. The package supports nine unidimensional item response models (the Rasch model, 1PL, 2PL, 3PL, graded response model, partial credit and generalized partial credit model, nominal response model, and multiple-choice model) and four separate calibration linking methods (mean/sigma, mean/mean, Haebara, and Stocking-Lord). It also includes functions for importing item and/or ability parameters from common IRT software, conducting IRT true-score and observed-score equating, and plotting item response curves and parameter comparison plots.
Highlights
In many measurement scenarios there is a need to compare results from multiple tests, but depending on the statistical properties of these measures and/or the sample of examinees, scores across tests may not be directly comparable; in most instances they are not
This model is typically identified using the same constraints on ajk and bjk as the nominal response model (NRM), and given that cjk represents the proportion of individuals who “guessed” a specific distractor, the multiple-choice model (MCM) imposes an additional constraint, where
There are four necessary elements that must be created to prepare the data prior to linking a set of tests using the function plink: 1. an object containing the item parameters, 2. an object specifying the number of response categories for each item, 3. an object identifying the item response models associated with each item, 4. an object identifying the common items between groups
Summary
In many measurement scenarios there is a need to compare results from multiple tests, but depending on the statistical properties of these measures and/or the sample of examinees, scores across tests may not be directly comparable; in most instances they are not. Linking methods were originally developed to equate observed scores for parallel test forms (Hull 1922; Kelley 1923; Gulliksen 1950; Levine 1955) These approaches work well when the forms are similar in terms of difficulty and reliability, but as the statistical specifications of the tests diverge, the comparability of scores across tests becomes increasingly unstable (Petersen, Cook, and Stocking 1983; Yen 1986). Thurstone (1925, 1938) developed observed score methods for creating vertical scales when the difficulties of the linked tests differ substantively These methods depend on item p-values or empirical score distributions which are themselves dependent on the sample of examinees and the particular items included on the tests. The following two sub-sections are included to acquaint the reader with the specific parameterizations used in the package
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have