Fcirt: An R Package for Forced Choice Models in Item Response Theory.
Multidimensional forced choice (MFC) formats have emerged as a promising alternative to traditional single statement Likert-type measures for assessing noncognitive traits while reducing response biases. As MFC formats become more widely used, there is a growing need for tools to support MFC analysis, which motivated the development of the fcirt package. The fcirt package estimates forced choice model parameters using Bayesian methods. It currently enables estimation of the Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000)-based Multi-Unidimensional Pairwise Preference (MUPP) model using rstan, which implements the Hamiltonian Monte Carlo (HMC) sampling algorithm. fcirt also includes functions for computing item and test information functions to evaluate the quality of MFC assessments, as well as functions for Bayesian diagnostic plotting to assist with model evaluation and convergence assessment.
- Research Article
15
- 10.1027/1015-5759/a000609
- Jul 1, 2020
- European Journal of Psychological Assessment
When constructing a questionnaire to assess a psychological construct, one important decision researchers have to make is how to collect responses from test takers; that is, which response format to implement.We argued in a previous editorial published in the European Journal of Psychological Assessment (EJPA) that this decision deserves more attention and should be an explicit step in the test construction process (Wetzel & Greiff, 2018).The reason for this is that it can be a consequential decision that influences the validity of conclusions we draw about test takers' trait levels or about relations between constructs and criteria (Brown & Maydeu-Olivares, 2013; Wetzel & Frick, 2020).In this editorial, which can be considered a followup to the first one, we will take a closer look at two response formats 1 : rating scales (RS), the current default in most questionnaires, and the multidimensional forced-choice (MFC) format, an alternative that is currently the focus of a considerable body of research.We will first define the two formats and point out some of their advantages and disadvantages.Then, we will provide a summary and evaluation of research comparing RS and MFC.Third, we will draw some preliminary conclusions on the feasibility of applying MFC as an alternative to RS. Fourth, we will point out some open research questions.We will end with some recommendations and implications for readers and authors of EJPA.In this editorial, the overall goal is to give researchers and test users an overview of the current state of the research on RS versus MFC and to provide guidance on the feasibility of applying MFC in research on psychological assessment.1 The multidimensional forced-choice format is both an item and a response format.For simplicity in the comparison with rating scales, we refer to it as response format.
- Research Article
20
- 10.3758/s13428-019-01274-6
- Jul 24, 2019
- Behavior Research Methods
Likert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length. Research has also explored the advantages of computerized adaptive testing (CAT) with MFC pair tests having as many as 25 dimensions, but there have been no published studies on CAT with MFC triplets or tetrads. Thus, in this research we aimed to address that issue. We used recently developed item information functions for an MFC ranking model to compare the benefits of CAT with MFC pair, triplet, and tetrad tests. A simulation study showed that CAT substantially outperformed nonadaptive testing for latent trait estimation across MFC formats. More importantly, CAT with MFC pairs provided estimation accuracy similar to or better than that from tests of equivalent numbers of nonadaptive MFC triplets. On the basis of these findings, implications and recommendations are further discussed for constructing MFC measures to use in psychological contexts.
- Research Article
11
- 10.1080/00273171.2021.1960142
- Jul 27, 2021
- Multivariate Behavioral Research
This research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP. We derived the information function to evaluate the psychometric properties of MFC measures and developed a model parameter estimation algorithm using Markov chain Monte Carlo (MCMC). To evaluate the efficacy of the proposed model, we conducted a simulation study under various experimental conditions such as sample sizes, number of items, and ranges of discrimination and location parameters. The results showed that the model parameters were accurately estimated when the sample size was as low as 500. The empirical results also showed that the scores from the ZG-MUPP model were comparable to those from the MUPP model and the Thurstonian IRT (TIRT) model. Practical implications and limitations are further discussed.
- Research Article
19
- 10.1177/1094428120959822
- Oct 8, 2020
- Organizational Research Methods
Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.
- Research Article
31
- 10.1177/0146621618768294
- Apr 23, 2018
- Applied Psychological Measurement
Historically, multidimensional forced choice (MFC) measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for interindividual comparisons. However, with the recent advent of item response theory (IRT) scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components in high-stake evaluation settings. This article aims to add to burgeoning methodological advances in MFC measurement by focusing on statement and person parameter recovery for the GGUM-RANK (generalized graded unfolding-RANK) IRT model. Markov chain Monte Carlo (MCMC) algorithm was developed for estimating GGUM-RANK statement and person parameters directly from MFC rank responses. In simulation studies, it was examined that how the psychometric properties of statements composing MFC items, test length, and sample size influenced statement and person parameter estimation; and it was explored for the benefits of measurement using MFC triplets relative to pairs. To demonstrate this methodology, an empirical validity study was then conducted using an MFC triplet personality measure. The results and implications of these studies for future research and practice are discussed.
- Research Article
2
- 10.1111/bmsp.12303
- Mar 26, 2023
- British Journal of Mathematical and Statistical Psychology
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.
- Research Article
- 10.1177/01466216251415189
- Jan 3, 2026
- Applied psychological measurement
The field of psychometrics has made remarkable progress in developing item response theory (IRT) models for analyzing multidimensional forced choice (MFC) measures. This study introduces an innovative method that enhances the latent trait estimation of the Multi-Unidimensional Pairwise Preference (MUPP) model by incorporating latent regression modeling. To validate the efficacy of the new method, we conducted a comprehensive simulation study. The results of the study provide compelling evidence that the proposed latent regression MUPP (LR-MUPP) model significantly improves the accuracy of the latent trait estimation. This study opens new avenues for future research and encourages further development and refinement of MFC IRT models and their applications.
- Research Article
74
- 10.1177/0013164417693666
- Feb 1, 2017
- Educational and Psychological Measurement
Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.
- Research Article
63
- 10.1080/10705511.2011.581993
- Jun 30, 2011
- Structural Equation Modeling: A Multidisciplinary Journal
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.
- Research Article
3
- 10.5750/ijpcm.v6i4.614
- Feb 2, 2017
- International Journal of Person Centered Medicine
Background: More robust and rigorous psychometric models, such as Item Response Theory (IRT) models, have been advocated for applications measuring health sciences outcomes. However, there are challenges to the use of IRT models with health assessments. In particular, item responses from measuring health-related outcomes are typically determined by multiple traits or dimensions. This multidimensionality can be caused by various factors including designed multidimensional structure to the instrument, heterogeneity in item content, and from other sources such as differential item functioning in subpopulations and individual differences in response styles to survey items and rating scales. Objectives: This paper discusses different extensions to IRT models that can be used to account for different types of multidimensionality as well as the use of Bayesian methods with person-centered medicine research.Methods: Use of the SAS PROC MCMC platform for implementing Bayesian analyses is illustrated to estimate and analyze IRT applications to health-related assessments. Results: PROC MCMC involves a straightforward translation of the response probability model along with specifications of the model parameters and prior distributions for the model parameters. Conclusions: Bayesian analysis of multidimensional IRT models is more accessible to researchers and scale developers in measuring health sciences outcomes for person-centered medicine research.
- Research Article
29
- 10.1177/00131644211045351
- Sep 13, 2021
- Educational and Psychological Measurement
Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.
- Book Chapter
- 10.4324/9781315871493-17
- Aug 20, 2015
This chapter seeks to highlight some of the unique types of item response theory (IRT) models that have emerged in support of computer-based testing (CBT). The multicategory scoring of many item types used in CBT are statistically improving measurement innovative item types has made polytomous IRT models of significant value in CBT. The polytomous IRT models are useful in evaluating the extent to which the innovative ciency. It is important to acknowledge other variants of testlet-based administration that can impact IRT modelling. The IRT models of increased relevance in CBT consist of multidimensional IRT models. In MIRT, item scores are modelled as a function of multiple person abilities. The diversity of IRT and IRT-related models needed for CBT has led to new thinking about how IRT models function within a broader assessment framework. The computer is offering much exciting future work within field of psychometrics for those who like to think creatively about the use of models in assessment contexts.
- Research Article
10
- 10.1016/j.jkss.2019.04.001
- May 17, 2019
- Journal of the Korean Statistical Society
A comparison of Monte Carlo methods for computing marginal likelihoods of item response theory models
- Front Matter
27
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Research Article
1
- 10.1027/2698-1866/a000044
- Aug 1, 2023
- Psychological Test Adaptation and Development
Abstract. The present paper features the adaptation of an existing Big Five questionnaire with a rating scale (RS) response format into a measure using a multidimensional forced choice (MFC) response format. Rating scale response formats have been criticized for their proneness to intentional and unintentional response distortions. Multidimensional forced choice response formats were suggested as a solution to mitigate several types of response sets and response styles by design. The Big Five Inventory of Personality in Occupational Situations (B5PS) is a situation-based questionnaire designed for personnel selection and development purposes which would benefit from fake-proof response formats. MFC response formats require special effort during test construction and calibration which will be laid out here. Changing the response format has severe consequences on item design and scoring. An inherent issue with MFC formats derives from their inability to yield interpersonal comparative results from standard (sum) scoring. This issue can be solved with item response theory (IRT)-based calibration during test construction. The Thurstonian IRT approach (TIRT) was developed by Brown and Maydeu-Olivares (2011) , and aspects of MFC item design and TIRT calibrations are explored in this paper. Evidence on structural and construct validity are presented alongside recommendations on the test development processes. The results support the feasibility of the concept of MFC test construction with TIRT calibration in a contextualized and situation-based item format.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.