Abstract

The <b>R</b> package <b>plink</b> has been developed to facilitate the linking of mixed-format tests for multiple groups under a common item design using unidimensional and multidimensional IRT-based methods. This paper presents the capabilities of the package in the context of the unidimensional methods. The package supports nine unidimensional item response models (the Rasch model, 1PL, 2PL, 3PL, graded response model, partial credit and generalized partial credit model, nominal response model, and multiple-choice model) and four separate calibration linking methods (mean/sigma, mean/mean, Haebara, and Stocking-Lord). It also includes functions for importing item and/or ability parameters from common IRT software, conducting IRT true-score and observed-score equating, and plotting item response curves and parameter comparison plots.

Highlights

  • In many measurement scenarios there is a need to compare results from multiple tests, but depending on the statistical properties of these measures and/or the sample of examinees, scores across tests may not be directly comparable; in most instances they are not

  • This model is typically identified using the same constraints on ajk and bjk as the nominal response model (NRM), and given that cjk represents the proportion of individuals who “guessed” a specific distractor, the multiple-choice model (MCM) imposes an additional constraint, where

  • There are four necessary elements that must be created to prepare the data prior to linking a set of tests using the function plink: 1. an object containing the item parameters, 2. an object specifying the number of response categories for each item, 3. an object identifying the item response models associated with each item, 4. an object identifying the common items between groups

Read more

Summary

Introduction

In many measurement scenarios there is a need to compare results from multiple tests, but depending on the statistical properties of these measures and/or the sample of examinees, scores across tests may not be directly comparable; in most instances they are not. Linking methods were originally developed to equate observed scores for parallel test forms (Hull 1922; Kelley 1923; Gulliksen 1950; Levine 1955) These approaches work well when the forms are similar in terms of difficulty and reliability, but as the statistical specifications of the tests diverge, the comparability of scores across tests becomes increasingly unstable (Petersen, Cook, and Stocking 1983; Yen 1986). Thurstone (1925, 1938) developed observed score methods for creating vertical scales when the difficulties of the linked tests differ substantively These methods depend on item p-values or empirical score distributions which are themselves dependent on the sample of examinees and the particular items included on the tests. The following two sub-sections are included to acquaint the reader with the specific parameterizations used in the package

Item response models
Calibration methods
Preparing the data
Formatting the item parameters
Specifying response categories
Specifying item response models
Combining elements and identifying common items
Importing parameters from IRT software
Running the calibration
Additional features
Computing response probabilities
IRT true-score and observed-score equating
Plotting results
Related software
Comparing the applications
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call