Item Response Theory and Differential Item Functioning Analyses With the Suicidal Behaviors Questionnaire-Revised in US and Chinese Samples.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: Despite the widespread use of the Suicidal Behaviors Questionnaire-Revised (SBQ-R) and advances in item response theory (IRT) modeling, item-level analysis with the SBQ-R has been minimal. Aims: This study extended IRT modeling strategies to examine the response parameters and potential differential item functioning (DIF) of the individual SBQ-R items in samples of US (N = 320) and Chinese (N = 298) undergraduate students. Method: Responses to the items were calibrated using the unidimensional graded response IRT model. Goodness-of-fit, item parameters, and DIF were evaluated. Results: The unidimensional graded response IRT model provided a good fit to the sample data. Results showed that the SBQ-R items had various item discrimination parameters and item severity parameters. Also, each SBQ-R item functioned similarly between the US and Chinese respondents. In particular, Item 1 (history of attempts) demonstrated high discrimination and severity of suicide-related thoughts and behaviors (STBs). Limitations: The use of cross-sectional data from convenience samples of undergraduate students could be considered a major limitation. Conclusion: The findings from the IRT analysis provided empirical support that each SBQ-R item taps into STBs and that scores for Item 1 can be used for screening purposes.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3389/fpsyg.2016.00255
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions
  • Feb 24, 2016
  • Frontiers in Psychology
  • Yoon Soo Park + 2 more

This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1007/s11136-023-03446-6
Measuring PROMIS pain interference in German patients with chronic conditions: calibration, validation, and cross-cultural use of item parameters
  • Jun 2, 2023
  • Quality of Life Research
  • Alexander Obbarius + 6 more

PurposeTo calibrate the item parameters of the German PROMIS® Pain interference (PROMIS PI) items using an item-response theory (IRT) model and investigate psychometric properties of the item bank.MethodsForty items of the PROMIS PI item bank were collected in a convenience sample of 660 patients, which were recruited during inpatient rheumatological treatment or outpatient psychosomatic medicine visits in Germany. Unidimensionality, monotonicity, and local independence were tested as required for IRT analyses. Unidimensionality was examined using confirmatory factor analyses (CFA) and exploratory factor analysis (EFA). Unidimensional and bifactor graded-response IRT models were fitted to the data. Bifactor indices were used to investigate whether multidimensionality would lead to biased scores. To evaluate convergent and discriminant validity, the item bank was correlated with legacy pain instruments. Potential differential item functioning (DIF) was examined for gender, age, and subsample. To investigate whether U.S. item parameters may be used to derive T-scores in German patients, T-scores based on previously published U.S. and newly estimated German item parameters were compared with each other after adjusting for sample specific differences.ResultsAll items were sufficiently unidimensional, locally independent, and monotonic. Whereas the fit of the unidimensional IRT model was not acceptable, a bifactor IRT model demonstrated acceptable fit. Explained common variance and Omega hierarchical suggested that using the unidimensional model would not lead to biased scores. One item demonstrated DIF between subsamples. High correlations with legacy pain instruments supported construct validity of the item bank. T-scores based on U.S. and German item parameters were similar suggesting that U.S. parameters could be used in German samples.ConclusionThe German PROMIS PI item bank proved to be a clinically valid and precise instrument for assessing pain interference in patients with chronic conditions.

  • Dissertation
  • 10.17077/etd.005181
Comparison of some detection methods on unidimensional IRT calibration
  • Apr 9, 2020
  • Lu Wang + 5 more

Nowadays it is not uncommon that tests, especially high-stakes assessments, are administered with time constraints. When a test is constructed to assess examinees’ abilities in academic knowledge, but the imposed time limits affect examinees’ test performance, speededness effects become a concern. Under such circumstances, inaccurate psychometric results and inferences might be drawn if unidimensional item response theory (IRT) models are applied in testing practice. Speededness detection methods were proposed to identify speeded responses/examinees. Thus, the purpose of the study was to comprehensively investigate how the performance of various detection methods combined with various calibration treatments compared in reducing speededness effects under the 2PL and 3PL IRT models with: (1) simulated test data under various speededness conditions, and (2) real test data. Both simulated and real data analyses were conducted in this study. Two simulation studies were conducted. For the first simulation study, two main factors were investigated: (1) degree of speededness (three levels: None, 10%, and 25%), and (2) IRT calibration model (two models: 2PL, and 3PL). The performance of various combinations of detection methods and calibration treatments were evaluated by assessing Pearson correlation, item parameter recovery, and model-data fit statistics. Data generated in the second simulation study were based on the estimated person and item parameter values obtained from IRT model calibration of the real data used in this study. Thus, the second simulation study served as a link between the pure simulation study and the real data study, because such a generation process enabled the simulated dataset to carry some characteristics of the real data, while true parameter values were known. The real data came from a large pool of a high-stakes standardized assessment items. In the current study, it was found that treating the identified speeded responses as “not-presented” could always lead to more accurate psychometric results compared to the other calibration treatments across various speededness levels under both the 2PL and 3PL IRT models. When the speededness level was large, “removing speeded examinees” could usually yield comparable results compared to “not-presented” treatments across different detection methods, and is a feasible and easily manipulated option in practice. In addition, it was found that detection methods using the item response time (RT) distribution as a speededness indicator (i.e., the INSPECT and VITP methods in the current study) generally showed better performance than the other detection methods in dealing with speededness effects. Moreover, in this study, it was found that the inclusion of the c-parameter could deal with rapid guessing strategy well. Thus, when the speededness level was not large, and mainly caused by rapid guessing behavior, “no treatment” under the 3PL IRT model yielded accurate psychometric results. The findings of the current study provide several feasible options for practitioners when speededness is a concern and unidimentional IRT models are used in the calibration or scoring process. It is hoped that this study will inspire researchers and practitioners to develop new detection methods, or ways of dealing with speededness effects under unidimensional IRT models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1208/s12248-020-00500-w
Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia
  • Aug 27, 2020
  • The AAPS Journal
  • Yassine Kamal Lyauk + 4 more

Item response theory (IRT) was used to characterize the time course of lower urinary tract symptoms due to benign prostatic hyperplasia (BPH-LUTS) measured by item-level International Prostate Symptom Scores (IPSS). The Fisher information content of IPSS items was determined and the power to detect a drug effect using the IRT approach was examined. Data from 403 patients with moderate-to-severe BPH-LUTS in a placebo-controlled phase II trial studying the effect of degarelix over 6 months were used for modeling. Three pharmacometric models were developed: a model for total IPSS, a unidimensional IRT model, and a bidimensional IRT model, the latter separating voiding and storage items. The population-level time course of BPH-LUTS in all models was described by initial improvement followed by worsening. In the unidimensional IRT model, the combined information content of IPSS voiding items represented 72% of the total information content, indicating that the voiding subscore may be more sensitive to changes in BPH-LUTS compared with the storage subscore. The pharmacometric models showed considerably higher power to detect a drug effect compared with a cross-sectional and while-on-treatment analysis of covariance, respectively. Compared with the sample size required to detect a drug effect at 80% power with the total IPSS model, a reduction of 5.9% and 11.7% was obtained with the unidimensional and bidimensional IPSS IRT model, respectively. Pharmacometric IRT analysis of the IPSS within BPH-LUTS may increase the precision and efficiency of treatment effect assessment, albeit to a more limited extent compared with applications in other therapeutic areas.

  • Research Article
  • Cite Count Icon 74
  • 10.1177/0013164410378690
Target Rotations and Assessing the Impact of Model Violations on the Parameters of Unidimensional Item Response Theory Models
  • Jun 29, 2011
  • Educational and Psychological Measurement
  • Steven Reise + 2 more

Reise, Cook, and Moore proposed a “comparison modeling” approach to assess the distortion in item parameter estimates when a unidimensional item response theory (IRT) model is imposed on multidimensional data. Central to their approach is the comparison of item slope parameter estimates from a unidimensional IRT model (a restricted model), with the item slope parameter estimates from the general factor in an exploratory bifactor IRT model (the unrestricted comparison model). In turn, these authors suggested that the unrestricted comparison bifactor model be derived from a target factor rotation. The goal of this study was to provide further empirical support for the use of target rotations as a method for deriving a comparison model. Specifically, we conducted Monte Carlo analyses exploring (a) the use of the Schmid–Leiman orthogonalization to specify a viable initial target matrix and (b) the recovery of true bifactor pattern matrices using target rotations as implemented in Mplus. Results suggest that to the degree that item response data conform to independent cluster structure, target rotations can be used productively to establish a plausible comparison model.

  • Book Chapter
  • 10.1007/978-3-319-93177-7_4
Item Response Theory
  • Jan 1, 2018
  • Patrick Mair

Item response theory (IRT) is a psychometric modeling framework for analyzing categorical data from questionnaires, tests, and other instruments that aim to measure underlying latent traits. Simply speaking, these models estimate a parameter for each item, as well as a parameter for each person. Depending on how many latent traits are involved, a core distinction in IRT is unidimensional vs. multidimensional IRT models. Hence, dimensionality assessment is important before fitting an IRT model, as elaborated in the first section. Subsequently, the focus is on various classical unidimensional models for dichotomous as well as polytomous input data. Afterward, three sections cover various special topics in IRT: item/test information, sample size determination, and differential item functioning, where differences in the item parameters are examined across person subgroups. Some modern IRT flavors are presented in final three sections on multidimensional IRT, longitudinal IRT, and Bayesian IRT.

  • Book Chapter
  • 10.1007/978-3-030-74772-5_36
Psychometric Models for a New State Science Assessment Aligned to the Next Generation Science Standards
  • Jan 1, 2021
  • Jing Chen + 3 more

The complexity of the Next Generation Science Standards (NGSS) poses significant task design, psychometric, and practical challenges for assessments. This study focuses on the psychometric challenges and explores an appropriate measurement model to interpret scores for an NGSS-aligned state science assessment. Multiple item response theory (IRT) models based on content specifications were applied to the data collected from a pilot test of the newly developed science assessment to identify the most appropriate model. Results suggest that although the three-dimensional IRT model that aligns with the NGSS dimensions provides slightly better overall model fit than the unidimensional IRT model and the testlet model, the item-level fit of the three-dimensional model is poor. Implementing multidimensional IRT (MIRT) models requires large sample sizes and a much longer estimation time, which poses challenges in an operational setting. Future studies can be conducted to further evaluate the need for using MIRT models and the robustness of a unidimensional model under various test conditions.KeywordsMultidimensional science assessment designNGSS-aligned assessmentMultidimensional IRT models

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.5539/jel.v6n4p113
Examination of Different Item Response Theory Models on Tests Composed of Testlets
  • Jun 14, 2017
  • Journal of Education and Learning
  • Esin Yilmaz Kogar + 1 more

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.

  • Research Article
  • 10.18637/jss.v103.i12
[RETRACTED ARTICLE] irtplay: An R Package for Unidimensional Item Response Theory Modeling
  • Jan 1, 2022
  • Journal of Statistical Software
  • Hwanggyu Lim + 1 more

Item response theory (IRT) is a general framework in which mathematical models are formulated to explain the relationship between an examinee's observable response on an item and the latent ability measured by a test. The application of IRT models and related statistical methods are commonly found in educational and psychological research. An important step in applying IRT models to test data is estimating the IRT model parameters. Accordingly, the successful application of IRT rests on the satisfactory statistical techniques and software for accurately estimating the model parameters. The irtplay R package was developed to provide users with a user-friendly experience and convenience when analyzing test data using unidimensional IRT models. The package can be used to fit the IRT models to a mixture of dichotomous and polytomous item data using marginal maximum likelihood estimation via the expectation-maximization, calibrate pretest items, and estimate examinees' latent ability parameters. In addition, the package provides practical tools that conveniently enable users to conduct many analyses related to IRT such as evaluating IRT model-data fit, analyzing differential item functioning, computing asymptotic variance-covariance matrices of item parameter estimates, calculating the conditional probability distribution of observed scores using the Lord and Wingersky (1984) formula, and importing item and ability parameter estimates from the output of popular IRT software. The main features of the irtplay package are illustrated using three data examples.

  • Research Article
  • 10.31158/jeev.2022.35.3.521
이요인 모형 기반 혼합형 검사를 위한 다차원 IRT 척도 연계 방법
  • Sep 30, 2022
  • Korean Society for Educational Evaluation
  • Seonghoon Kim

Like unidimensional item response theory (IRT) models, bifactor models in multidimensional IRT have a scale indeterminacy problem, and due to this problem scale linking methods are needed to place all bifactor model parameter estimates from separate calibrations on a common ability scale. Four bifactor scale linking methods including the direct least squares (DLS), mean/least squares (MLS), item category response function (ICRF), and test response function (TRF) methods have been presented for use with single-format tests. Parallel to the 2006 paper of Kim and Lee, this paper extends the four scale linking methods to a mixture of bifactor models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following bifactor extensions of four unidimensional IRT models: the bifactor three-parameter logistic, bifactor graded response, bifactor generalized partial credit, and bifactor nominal response models. For generality, symmetric criterion functions are proposed for the ICRF and TRF methods. Given two sets of parameter estimates for the common items linking two test forms, each linking method estimates the dilation (slope) and translation (intercept) coefficients of a linear transformation. Simulations are conducted to investigate the performance of the four linking methods. The results indicate that overall, the ICRF method performs very well, the MLS and DLS methods perform well (the MLS method is slightly better than the DLS method), and the TRF method performs poorly in estimating the linking coefficients. The inferiority of the TRF method is mainly due to its poor estimation of the translation coefficients.

  • Research Article
  • Cite Count Icon 33
  • 10.1080/00273171.2018.1455572
Robustness of Parameter Estimation to Assumptions of Normality in the Multidimensional Graded Response Model
  • Apr 6, 2018
  • Multivariate Behavioral Research
  • Chun Wang + 2 more

ABSTRACTA central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.

  • Research Article
  • 10.3758/s13428-025-02666-7
Two-part sequential measurement models for distinguishing between symptom presence and symptom severity.
  • May 22, 2025
  • Behavior research methods
  • Scott A Baldwin + 1 more

Two common aspects of symptom measurement are 1) the occurrence or presence of symptoms, and 2) the intensity or severity of symptoms when they occur. We adopt a latent trait perspective based on item response theory (IRT), using both unidimensional and multidimensional IRT models. We demonstrate how to (a) prepare data for analysis, (b) specify, estimate, and compare models, (c) interpret model parameters, (d) compare scores from models, and (e) visualize analysis results. We develop the relevant sequential IRT model, noting its flexibility, congruence with the theorized data generating process for symptom measures, and its promise for facilitating additional research and practical applications. The sequential model is less frequently used than other IRT models for polytomous data such as the generalized partial credit or graded response models. However, estimation of the sequential model can be readily accomplished with standard latent variable modeling and IRT software for binary indicators that allows constraints on the discrimination parameters. We compare the sequential model to other modeling options. We provide discussion of future research directions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3389/feduc.2022.801372
Investigating Subscores of VERA 3 German Test Based on Item Response Theory/Multidimensional Item Response Theory Models
  • Apr 8, 2022
  • Frontiers in Education
  • Güler Yavuz Temel + 3 more

In this study, the psychometric properties of the listening and reading subtests of the German VERA 3 test were examined using Item Response Theory (IRT) and Multidimensional Item Response Theory (MIRT) models. Listening and reading subscores were estimated using unidimensional Rasch, 1PL, and 2PL models, and total scores on the German test (listening + reading) were estimated using unidimensional and multidimensional IRT models. Various MIRT models were used, and model fit was compared in a cross-validation study. The results of the study showed that unidimensional models of the reading and listening subtests and the German test provided a good overall model-data fit, however, multidimensional models of the subtests provided a better fit. The results demonstrated that, although the subtest scores also fit adequately independently, estimating the scores of the overall test with a model (e.g., bifactor) that includes a general factor (construct) in addition to the subfactors significantly improved the psychometric properties of the test. A general factor was identified that had the highest reliability values; however, the reliabilities of the specific factors were very low. In addition to the fit of the model data, the fit of the persons with IRT/MIRT models was also examined. The results showed that the proportion of person misfit was higher for the subtests than for the overall tests, but the overfit was lower. NA-German students, who did not speak German all-day, had the highest proportion of misfits with all models.

  • Research Article
  • 10.7822/omuefd.1419482
Investigation of Models Used in Equating Testlet-Based Tests
  • Feb 5, 2024
  • Ondokuz Mayis University Journal of Education Faculty
  • Ertunç Ukşul + 1 more

This study aims to examine the effects of testlets on test equating. For this purpose unidimensional item response theory, two-factor item response theory and testlet response theory models were applied to the testlet-based tests for the estimation of item and ability parameters. In order to equate the tests, the parameters were placed on the common scale using mean-mean, mean-sigma and Stocking-Lord scale transformation methods under the common-item non-equivalent groups design. Then, the equating errors of the models depending on the scale transformation method and the number of testlets were calculated and compared. Equating errors were compared with Root Mean Squared Error. In the study, the science test of the Trends in International Mathematics and Science Study project administered in 2019 was used as the data collection tool. As a result of the study, it was determined that the use of unidimensional item response theory model increased the equating error, while the use of two-factor and testlet response theory models decreased the equating error as the number of testlets in the test increased. In order to compare the models, the correlation between the parameters obtained from the models after scale transformation was examined and it was found that the item parameters were more affected by the model selection than the ability parameter. In addition, it was concluded that the equating errors obtained from the mean-mean and Stocking-Lord scale transformation methods were lower than the mean-sigma method.

  • Research Article
  • Cite Count Icon 4
  • 10.21449/ijate.790289
A Guide for More Accurate and Precise Estimations in Simulative Unidimensional IRT Models
  • Jun 10, 2021
  • International Journal of Assessment Tools in Education
  • Fulya Bari̇s Pekmezci̇ + 1 more

There is a great deal of research about item response theory (IRT) conducted by simulations. Item and ability parameters are estimated with varying numbers of replications under different test conditions. However, it is not clear what the appropriate number of replications should be. The aim of the current study is to develop guidelines for the adequate number of replications in conducting Monte Carlo simulation studies involving unidimensional IRT models. For this aim, 192 simulation conditions which included four sample sizes, two test lengths, eight replication numbers, and unidimensional IRT models were generated. Accuracy and precision of item and ability parameter estimations and model fit values were evaluated by considering the number of replications. In this context, for the item and ability parameters; mean error, root mean square error, standard error of estimates, and for model fit; M_2, 〖RMSEA〗_2, and Type I error rates were considered. The number of replications did not seem to influence the model fit, it was decisive in Type I error inflation and error prediction accuracy for all IRT models. It was concluded that to get more accurate results, the number of replications should be at least 625 in terms of accuracy of the Type I error rate estimation for all IRT models. Also, 156 replications and above can be recommended. Item parameter biases were examined, and the largest bias values were obtained from the 3PL model. It can be concluded that the increase in the number of parameters estimated by the model resulted in more biased estimates.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon