Item response theory – A first approach
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models – IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
- Front Matter
28
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Research Article
38
- 10.1080/00949655.2011.603090
- Feb 1, 2013
- Journal of Statistical Computation and Simulation
Markov chain Monte Carlo (MCMC) algorithms have been shown to be useful for estimation of complex item response theory (IRT) models. Although an MCMC algorithm can be very useful, it also requires care in use and interpretation of results. In particular, MCMC algorithms generally make extensive use of priors on model parameters. In this paper, MCMC estimation is illustrated using a simple mixture IRT model, a mixture Rasch model (MRM), to demonstrate how the algorithm operates and how results may be affected by some commonly used priors. Priors on the probabilities of mixtures, label switching, model selection, metric anchoring, and implementation of the MCMC algorithm using WinBUGS are described, and their effects illustrated on parameter recovery in practical testing situations. In addition, an example is presented in which an MRM is fitted to a set of educational test data using the MCMC algorithm and a comparison is illustrated with results from three existing maximum likelihood estimation methods.
- Research Article
15
- 10.3758/brm.41.4.1127
- Nov 1, 2009
- Behavior Research Methods
Item response theory (IRT) models are the central tools in modern measurement and advanced psychometrics. We offer a MATLAB IRT modeling (IRTm) toolbox that is freely available and that follows an explicit design matrix approach, giving the end user control and flexibility in building a model that goes beyond standard models, such as the Rasch model (Rasch, 1960) and the two-parameter logistic model. As such, IRTm allows for a large variety of unidimensional IRT models for binary responses, the incorporation of additional person and item information, and deviations from common model assumptions. An exclusive key feature of the toolbox is the inclusion of copula IRT models to handle local item dependencies. Two appendixes for this report, containing example code and information on the general copula IRT in IRTm, may be downloaded from brm.psychonomic-journals.org/content/supplemental.
- Research Article
3
- 10.3390/a17040153
- Apr 6, 2024
- Algorithms
Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters λ. The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC.
- Research Article
6
- 10.1177/0962280213504177
- Jul 11, 2016
- Statistical Methods in Medical Research
Both item response theory and structural equation models are useful in the analysis of ordered categorical responses from health assessment questionnaires. We highlight the advantages and disadvantages of the item response theory and structural equation modelling approaches to modelling ordinal data, from within a community health setting. Using data from the SPARCLE project focussing on children with cerebral palsy, this paper investigates the relationship between two ordinal rating scales, the KIDSCREEN, which measures quality-of-life, and Life-H, which measures participation. Practical issues relating to fitting models, such as non-positive definite observed or fitted correlation matrices, and approaches to assessing model fit are discussed. item response theory models allow properties such as the conditional independence of particular domains of a measurement instrument to be assessed. When, as with the SPARCLE data, the latent traits are multidimensional, structural equation models generally provide a much more convenient modelling framework.
- Book Chapter
- 10.4324/9781315871493-17
- Aug 20, 2015
This chapter seeks to highlight some of the unique types of item response theory (IRT) models that have emerged in support of computer-based testing (CBT). The multicategory scoring of many item types used in CBT are statistically improving measurement innovative item types has made polytomous IRT models of significant value in CBT. The polytomous IRT models are useful in evaluating the extent to which the innovative ciency. It is important to acknowledge other variants of testlet-based administration that can impact IRT modelling. The IRT models of increased relevance in CBT consist of multidimensional IRT models. In MIRT, item scores are modelled as a function of multiple person abilities. The diversity of IRT and IRT-related models needed for CBT has led to new thinking about how IRT models function within a broader assessment framework. The computer is offering much exciting future work within field of psychometrics for those who like to think creatively about the use of models in assessment contexts.
- Research Article
31
- 10.1177/00131644211045351
- Sep 13, 2021
- Educational and Psychological Measurement
Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.
- Research Article
3
- 10.1080/00131881.2012.658201
- Mar 1, 2012
- Educational Research
Background: Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues with on-demand testing is that some of the methods used for maintaining the comparability of standards over time in conventional testing are no longer available and the development of new methods is required. Purpose: This paper proposes an item response theory (IRT) framework for implementing on-demand testing and maintaining the comparability of standards over time for general qualifications, including GCSEs and GCE A levels, in the UK and discusses procedures for its practical implementation. Sources of evidence: Sources of evidence include literature from the fields of on-demand testing, the design of computer-based assessment, the development of IRT, and the application of IRT in educational measurement. Main argument: On-demand testing presents many advantages over conventional testing. In view of the nature of general qualifications, including the use of multiple components and multiple question types, the advances made in item response modelling over the past 30 years, and the availability of complex IRT analysis software systems, coupled with increasing IRT expertise in awarding organisations, IRT models could be used to implement on-demand testing in high stakes examinations in the UK. The proposed framework represents a coherent and complete approach to maintaining standards in on-demand testing. The procedures for implementing the framework discussed in the paper could be adapted by people to suit their own needs and circumstances. Conclusions: The use of IRT to implement on-demand testing could prove to be one of the viable approaches to maintaining standards over time or between test sessions for UK general qualifications.
- Research Article
15
- 10.1027/1015-5759/a000609
- Jul 1, 2020
- European Journal of Psychological Assessment
When constructing a questionnaire to assess a psychological construct, one important decision researchers have to make is how to collect responses from test takers; that is, which response format to implement.We argued in a previous editorial published in the European Journal of Psychological Assessment (EJPA) that this decision deserves more attention and should be an explicit step in the test construction process (Wetzel & Greiff, 2018).The reason for this is that it can be a consequential decision that influences the validity of conclusions we draw about test takers' trait levels or about relations between constructs and criteria (Brown & Maydeu-Olivares, 2013; Wetzel & Frick, 2020).In this editorial, which can be considered a followup to the first one, we will take a closer look at two response formats 1 : rating scales (RS), the current default in most questionnaires, and the multidimensional forced-choice (MFC) format, an alternative that is currently the focus of a considerable body of research.We will first define the two formats and point out some of their advantages and disadvantages.Then, we will provide a summary and evaluation of research comparing RS and MFC.Third, we will draw some preliminary conclusions on the feasibility of applying MFC as an alternative to RS. Fourth, we will point out some open research questions.We will end with some recommendations and implications for readers and authors of EJPA.In this editorial, the overall goal is to give researchers and test users an overview of the current state of the research on RS versus MFC and to provide guidance on the feasibility of applying MFC in research on psychological assessment.1 The multidimensional forced-choice format is both an item and a response format.For simplicity in the comparison with rating scales, we refer to it as response format.
- Research Article
17
- 10.3102/1076998607306451
- Dec 1, 2008
- Journal of Educational and Behavioral Statistics
The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear mixed effects and/or item response theory model. Although the individual true item responses are masked through randomizing the responses, the model extension enables the computation of individual true item response probabilities and estimates of individuals’ sensitive behavior/attitude and their relationships with background variables taking into account any clustering of respondents. Results are presented from a College Alcohol Problem Scale (CAPS) where students were interviewed via direct questioning or via a randomized response technique. A Markov Chain Monte Carlo algorithm is given for estimating simultaneously all model parameters given hierarchical structured binary or polytomous randomized item response data and background variables.
- Abstract
2
- 10.1182/blood-2021-148278
- Nov 5, 2021
- Blood
Validation of the Pyruvate Kinase Deficiency Diary (PKDD): A Patient-Reported Outcome Measure for Pyruvate Kinase (PK) Deficiency
- Research Article
- 10.15611/eada.2018.1.01
- Jan 1, 2018
- ECONOMETRICS
Item Response Theory (IRT) is an extension of the Classical Test Theory (CCT) and focuses on how specific test items function in assessing a construct. They are widely known in psychology, medicine, and marketing, as well as in social sciences. An item response model specifies a relationship between the observable examinee test performance and the unobservable traits or abilities assumed to underlie performance on the test. Within the broad framework of item response theory, many models can be operationalized because of the large number of choices available for the mathematical form of the item characteristic curves. In this paper we introduce several types of IRT models such as: the Rasch, and the Birnbaum model. We present the main assumptions for IRT analysis, estimation method, properties, and model selection methods. In this paper we present the application of IRT analysis for binary data with the use of the ltm package in R.
- Research Article
64
- 10.1080/10705511.2011.581993
- Jun 30, 2011
- Structural Equation Modeling: A Multidisciplinary Journal
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.
- Book Chapter
4
- 10.1007/978-3-319-56294-0_7
- Jan 1, 2017
Markov chain Monte Carlo (MCMC) techniques have become popular for estimating item response theory (IRT) models. The current development of MCMC includes two major algorithms: Gibbs sampling and the No-U-Turn sampler (NUTS), which can be implemented in two specialized software packages JAGS and Stan, respectively. This study focused on comparing these two algorithms in estimating the two-parameter logistic (2PL) IRT model where different prior specifications for the discrimination parameter were considered. Results suggest that Gibbs sampling performed similarly to the NUTS under most of the conditions considered. In addition, both algorithms recovered model parameters with a similar precision except for small sample size situations. Findings from this study also shed light on the use of the two MCMC algorithms with more complicated IRT models.
- Research Article
10
- 10.1093/swr/34.2.94
- Jun 1, 2010
- Social Work Research
The need to develop measures that tap into constructs of interest to social work, refine existing measures, and ensure that measures function adequately across diverse populations of interest is critical. Item response theory (IRT) is a modern measurement approach that is increasingly seen as an essential tool in a number of allied professions. IRT-based measurement uses a model-based approach that has several analytical and explanatory advantages over classical test theory. In particular, IRT-based techniques facilitate the process of specific item selection, allow for increased measurement precision with fewer items, and provide greater capacity for understanding and accounting for measurement bias across diverse populations. A survey of the top (as rated by impact factor) 20 social work journals revealed that few measurement articles in the social work literature use IRT or other modern measurement approaches. The benefit of incorporating more IRT-based approaches for developing, refining, and ensuring the application of measures to diverse populations is discussed. KEY WORDS: bias; classical test theory; item response theory; measurement; social work ********** The state of measurement within the social work literature is integrally related to knowledge base development and, ultimately, the extent to which research is able to meaningfully inform practice (Holden, Nizza, & Weissman, 1995). Scholarship highlights at least three measurement-related research domains within the field of social work. The first concerns the development of valid and reliable measures that capture the diverse set of phenomena relevant to social work, particularly those phenomena that may not be adequately represented by existent standardized instruments. The second is the assessment and validation of such measures. In particular, high-quality intervention research hinges on the validity and reliability of measures used to assess outcomes (Rosen, Proctor, & Staudt, 1999).Third, a growing body of literature challenges the extent to which well-validated measures adequately account and adjust for within- and across-population sources of diversity (see Ramirez, Ford, Stewart, & Teresi, 2005; Snowden, 2003), and such concerns are highly salient to social work's commitment to diversity-sensitive and -responsive research and practice. During the 1980s and 1990s, social work researchers outlined the relative benefits of item response theory (IRT) over classical test theory (CTT) measurement models, calling explicitly for IRT-based models' increased utilization to address measurement problems in social work research (DeRoos & Allen-Meares, 1993, 1998; Nugent & Hankins, 1989,1992). Indeed, IRT models have largely subsumed CTT approaches within a wide range of allied fields and disciplines (for example, medicine, psychology, nursing, public health, education) (see Dunn, Resnicow, & Klesges, 2006; Embretson & Reise, 2000; Fries, Bruce, & Cella, 2005; Lord, 1980; Ware, Bjorner, & Kosinski, 2000). Given early interest among social work researchers and the recent proliferation of IRT methods within other applied social sciences, our overall objective in the present study was to assess the extent to which these methods are represented within social work research. This review thus realizes three overlapping aims. First, it provides a description and comparison of IRT and CTT models and outlines the potential contributions of IRT methods to social work scholarship; it also briefly discusses IRT more generally as a latent variable model and its overlap with confirmatory factor analytic (CFA) and multi-level modeling methods. Second, it presents the results of a structured review assessing the penetration of IRT-based methods into the field of social work as reflected in key social work research journals. Third, using these results as a launching point, we highlight particular lines of inquiry within social work research where the application of IRT methods would likely yield substantial innovation. …
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.