Item Response Theory Modeling of the Verb Naming Test.
Item response theory (IRT) is a modern psychometric framework with several advantageous properties as compared with classical test theory. IRT has been successfully used to model performance on anomia tests in individuals with aphasia; however, all efforts to date have focused on noun production accuracy. The purpose of this study is to evaluate whether the Verb Naming Test (VNT), a prominent test of action naming, can be successfully modeled under IRT and evaluate its reliability. We used responses on the VNT from 107 individuals with chronic aphasia from AphasiaBank. Unidimensionality and local independence, two assumptions prerequisite to IRT modeling, were evaluated using factor analysis and Yen's Q 3 statistic (Yen, 1984), respectively. The assumption of equal discrimination among test items was evaluated statistically via nested model comparisons and practically by using correlations of resulting IRT-derived scores. Finally, internal consistency, marginal and empirical reliability, and conditional reliability were evaluated. The VNT was found to be sufficiently unidimensional with the majority of item pairs demonstrating adequate local independence. An IRT model in which item discriminations are constrained to be equal demonstrated fit equivalent to a model in which unique discrimination parameters were estimated for each item. All forms of reliability were strong across the majority of IRT ability estimates. Modeling the VNT using IRT is feasible, yielding ability estimates that are both informative and reliable. Future efforts are needed to quantify the validity of the VNT under IRT and determine the extent to which it measures the same construct as other anomia tests. https://doi.org/10.23641/asha.22329235.
- Front Matter
28
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Research Article
- 10.18269/jpmipa.v16i2.234
- Oct 1, 2011
- Jurnal Pengajaran Matematika dan Ilmu Pengetahuan Alam
This research is title “Test Development and Analysis of First Grade Senior High School Final Examination in chemistry Based on Classical Test Theory and Item Response Theory”. This research is conducted to develop a standard test instrument for final examination in senior high school at first grade using analysis based on classical test theory and item response theory. The test is a multiple choice test which consists of 75 items. Each item has five options. The research method is research and development method to get a product of test items which fulfill item criterion such as validity, reliability, item discrimination, item difficulty and distracting options quality based on classical test theory and validity, reliability, item discrimination, item difficulty and pseudo-guessing based on item response theory. The three parameter item response theory model is used in this research. Research and development method is conducted until preliminary field test to 102 first grade students in senior high school. Based on the research result, the test fulfills criterion as a good instrument based on classical test theory and item response theory. The final examination test items have vary of item quality so that some of them need a revision to make them better either for the stem and the options. From the total of 75 test items, 21 test items are declined and 54 test items are accepted.
- Research Article
10
- 10.1093/swr/34.2.94
- Jun 1, 2010
- Social Work Research
The need to develop measures that tap into constructs of interest to social work, refine existing measures, and ensure that measures function adequately across diverse populations of interest is critical. Item response theory (IRT) is a modern measurement approach that is increasingly seen as an essential tool in a number of allied professions. IRT-based measurement uses a model-based approach that has several analytical and explanatory advantages over classical test theory. In particular, IRT-based techniques facilitate the process of specific item selection, allow for increased measurement precision with fewer items, and provide greater capacity for understanding and accounting for measurement bias across diverse populations. A survey of the top (as rated by impact factor) 20 social work journals revealed that few measurement articles in the social work literature use IRT or other modern measurement approaches. The benefit of incorporating more IRT-based approaches for developing, refining, and ensuring the application of measures to diverse populations is discussed. KEY WORDS: bias; classical test theory; item response theory; measurement; social work ********** The state of measurement within the social work literature is integrally related to knowledge base development and, ultimately, the extent to which research is able to meaningfully inform practice (Holden, Nizza, & Weissman, 1995). Scholarship highlights at least three measurement-related research domains within the field of social work. The first concerns the development of valid and reliable measures that capture the diverse set of phenomena relevant to social work, particularly those phenomena that may not be adequately represented by existent standardized instruments. The second is the assessment and validation of such measures. In particular, high-quality intervention research hinges on the validity and reliability of measures used to assess outcomes (Rosen, Proctor, & Staudt, 1999).Third, a growing body of literature challenges the extent to which well-validated measures adequately account and adjust for within- and across-population sources of diversity (see Ramirez, Ford, Stewart, & Teresi, 2005; Snowden, 2003), and such concerns are highly salient to social work's commitment to diversity-sensitive and -responsive research and practice. During the 1980s and 1990s, social work researchers outlined the relative benefits of item response theory (IRT) over classical test theory (CTT) measurement models, calling explicitly for IRT-based models' increased utilization to address measurement problems in social work research (DeRoos & Allen-Meares, 1993, 1998; Nugent & Hankins, 1989,1992). Indeed, IRT models have largely subsumed CTT approaches within a wide range of allied fields and disciplines (for example, medicine, psychology, nursing, public health, education) (see Dunn, Resnicow, & Klesges, 2006; Embretson & Reise, 2000; Fries, Bruce, & Cella, 2005; Lord, 1980; Ware, Bjorner, & Kosinski, 2000). Given early interest among social work researchers and the recent proliferation of IRT methods within other applied social sciences, our overall objective in the present study was to assess the extent to which these methods are represented within social work research. This review thus realizes three overlapping aims. First, it provides a description and comparison of IRT and CTT models and outlines the potential contributions of IRT methods to social work scholarship; it also briefly discusses IRT more generally as a latent variable model and its overlap with confirmatory factor analytic (CFA) and multi-level modeling methods. Second, it presents the results of a structured review assessing the penetration of IRT-based methods into the field of social work as reflected in key social work research journals. Third, using these results as a launching point, we highlight particular lines of inquiry within social work research where the application of IRT methods would likely yield substantial innovation. …
- Research Article
6
- 10.12738/estp.2017.2.0246
- Jan 1, 2017
- Educational Sciences: Theory & Practice
Tests used for such purposes as determining educational quality, defining educational needs, hiring an employee, student selection and placement and performing guidance and clinic services have an important place in education and psychology. Of course, they should have certain psychometric features related to test scores' validity and reliability. Various test theories have helped to create more valid and reliable measurements and, as a result, to make better decisions regarding individuals. In education and psychology, Classical Test Theory (CTT) and Item Response Theory (IRT) are both widely used. CTT assumes that an individual's observed score is the total of the true score and the error score, while IRT estimates an individual's ability or latent trait from responses to test items (Embretson & Reise, 2000).When IRT assumptions and model-data fit are ensured, item and ability parameters' invariance occurs; this is known as the most important advantage IRT has over CTT. Item and ability parameters' invariance means estimating ability parameters independently of item sample and estimating item parameters independently of ability sample. IRT's invariance feature makes it very practicable in many applications, for instance, test development, computerized adaptive testing, bias studies, test equating and item mapping (Hambleton & Swaminathan, 1985). IRT is classified under two main categories as parametric IRT (PIRT) and nonparametric IRT (NIRT) (Olivares, 2005; Sijtsma & Molenaar, 2002).To analyse ordered items, such as Likert-type attitude items, partial credit cognitive items or not ordered graded items such as multiple-choice test items, item response models are developed towards polytomous items in IRT (Ostini & Nering, 2006). In these models developed for polytomous items, a non-linear relationship between an individual's latent trait and the possibility of choosing a certain category of item answer is explained (Embretson & Reise, 2000). Graded Response Model (GRM), part of IRT models developed for polytomous items, is often preferred by researchers for applications since it is more useful in presentations, portfolios, essays and Likert-type items with ordered item categories (DeMars, 2010; Ostini & Nering, 2006). To scale tests that consist of polytomous items by making true estimates according to GRM, evaluating PIRT's assumptions and model-data fit is necessary. And to provide these assumptions and model-data fit, large samples are needed. At this point, NIRT models draw attention because they provide a practical advantage in determining psychometric properties of tests with fewer items and respondents (Stout, 2001).NIRT models are defined as statistical scaling methods that require fewer assumptions than PIRT models for measuring persons and items (Stochl, 2007). With their wide application area, NIRT models are used in ordinal scales, applied research areas, sociology, marketing research and health research on quality of life (Sijtsma, 2005). The literature reveals that two models, namely, the Mokken model and nonparametric regression estimation models, are employed. These two models are themselves divided into sub-models. The Mokken model consists of the sub-models Monotone Homogeneity Model (MHM) and the Double Monotonicity Model (DMM). Nonparametric regression estimation models consist of such sub-models as the Kernel Smoothing Approach Model (KSAM), the Isotonic Regression Estimation and the Smoothed Isotonic Regression Estimation models (Lee, 2007; Sijtsma & Molenaar, 2002). Along with theoretical studies being conducted, new sub-models are being added to nonparametric regression estimation models.As a NIRT model, MHM requires unidimensionality, local independence and monotonicity assumptions, and it defines the relationship that latent variables and items with homogeneous (unidimensional) and monotone item characteristic curve (ICC) have (Meijer & Baneke, 2004; Sijtsma & Molenaar, 2002). …
- Research Article
1
- 10.1177/008124639002000408
- Dec 1, 1990
- South African Journal of Psychology
One of the great advantages of the modern approach to item and test analysis, namely, the Item Response Theory (IRT) over the traditional Classical Test Theory (CTT) is that item statistics such as the difficulty index and the discrimination index are not sample dependent. Very few research findings using IRT models have been published locally. Hoffman is one of the small group of researchers who have implemented IRT models in South Africa. The purpose of the present study was to compare CTT and IRT item and test analysis. Data from the multiple choice part of the 1988 examination paper on research methodology for 538 undergraduate UNISA Industrial Psychology students were used. Results showed that corresponding interpretations of item statistics (CTT) and item parameters (IRT) was possible under the two approaches. More or less the same conclusions could also be drawn with regard to evaluation of test statistics. The near-perfect correlation between test scores (CTT) and ability estimates (IRT) for the one-parameter model indicates that a great similarity exists between them.
- Research Article
37
- 10.1177/0013164414559071
- Nov 20, 2014
- Educational and Psychological Measurement
There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior to the CTT framework for the purpose of estimating person and item parameters. In previous simulation studies, IRT models were used both as generating and as fitting models. Hence, results favoring the IRT framework could be attributed to IRT being the data-generation framework. Moreover, previous studies only considered the traditional CTT framework for the comparison, yet there is considerable literature suggesting that it may be more appropriate to use CTT statistics based on an underlying normal variable (UNV) assumption. The current study relates the class of CTT-based models with the UNV assumption to that of IRT, using confirmatory factor analysis to delineate the connections. A small Monte Carlo study was carried out to assess the comparability between the item and person statistics obtained from the frameworks of IRT and CTT with UNV assumption. Results show the frameworks of IRT and CTT with UNV assumption to be quite comparable, with neither framework showing an advantage over the other.
- Research Article
9
- 10.1136/postgradmedj-2014-133197
- Aug 1, 2015
- Postgraduate Medical Journal
BackgroundAlthough biostatistics and clinical epidemiology are essential for comprehending medical evidence, research has shown consistently low and variable knowledge among postgraduate medical trainees. Simultaneously, there has been an increase in...
- Research Article
20
- 10.1002/j.2333-8504.1982.tb01298.x
- Jun 1, 1982
- ETS Research Report Series
THE FEASIBILITY OF USING ITEM RESPONSE THEORY AS A PSYCHOMETRIC MODEL FOR THE GRE APTITUDE TEST
- Research Article
7
- 10.1186/s40359-023-01251-x
- Aug 7, 2023
- BMC Psychology
BackgroundSince March 2022, the COVID-19 epidemic has rebounded widely and frequently in China. Healthcare workers have faced grand challenges such as soaring COVID-19 patients, being busy with the nucleic acid screening of all the populations in the epidemic areas every day, and testing positive for COVID-19, all of which contributed to anxiety easily according to the Conservation of Resources theory. However, anxiety among healthcare workers is not only associated with personal health but also adversely affects the quality of health services. Therefore, it is crucial to search for suitable tools to monitor the anxiety related to COVID-19 among healthcare workers. The current study aimed to test the Coronavirus Anxiety Scale (CAS) in Chinese healthcare workers.MethodsThe current study employed a cross-sectional design. The CAS was translated into Chinese. Then, according to Classical Test Theory (CTT) and Item Response Theory (IRT) models, the psychometric properties of the Chinese version were measured among 811 healthcare workers.ResultsThe split‐half reliability was 0.855. The Cronbach’s α coefficient was 0.895. The retest coefficient was 0.901 with 10 days as the retest interval. The content validity index was 0.920. In exploratory factor analysis, one common factor was extracted and explained 72.559% of the total variance. All item load values on the common factor ranged from 0.790 to 0.885, and the communality of each item ranged from 0.625 to 0.784. With confirmatory factor analysis, the single factor model showed an excellent goodness-of-fit, chi-square/degree of freedom (χ2/df) = 3.339, goodness of fit index (GFI) = 0.992, adjusted goodness of fit index (AGFI) = 0.975, root-mean-square error of approximation (RMSEA) = 0.054, root mean square residual (RMR) = 0.005, incremental fit index (IFI) = 0.967, Tucker-Lewis index (TLI) = 0.932, and comparative fit index (CFI) = 0.966. The multiple-group confirmatory factor analysis revealed the invariance measuring anxiety of COVID-19 was in similar ways across ages, hospital degrees, and professional titles. With convergent validity, the CAS was positively correlated with post-traumatic stress disorder (r = 0.619, P < 0.001), fear of COVID (r = 0.550, P < 0.001), and depression (r = 0.367, P < 0.001). According to IRT models, the results showed that all item discrimination parameters were higher than 1.70 and difficulty parameters ranged from 1.13 to 2.83.ConclusionThe Chinese version of CAS has good psychometric properties in healthcare workers after China adjusted the COVID-19 management measures during the COVID-19 Omicron epidemic, and can be used for assessing the anxiety associated with COVID-19 in Chinese healthcare workers.
- Research Article
- 10.25170/ijelt.v1i1.98
- May 1, 2005
- Journal on English Language Teaching
Item response theory (IRT) emerges as an accurate solution to the weaknesses of the classical test theory (CTT). IRT provides more advantages than CTT does. The advantages include the requirements of unidimension for items, local independence between examinees and items, and examinee-item parameter invariance. The requirements are needed in test construction. TOEFL is so far known as the test which meets the requirements in language testing. It however concerns IRT. In this case, the research deals with the reading subtest of TOEFL with regard to IRT. The research is designed to estimate examinee-item parameters. As a parameter logistic (1PL) model in IRT, the Prox method is employed to estimate the parameters jointly. This is named joint maximum likelihood estimates. The method requires dichotomous data. Therefore, TOEFL as a good test instrument is chosen. It includes 30 persons as the examinee measure θ parameter and 20 items as the item difficulty b parameter. Unlike CTT, IRT using Prox method is able to estimate the examinee-item parameters jointly. As a result, the values of θ and b prove the ranges as the model intended in IRT, which is commonly named as the item characteristic curve. Keywords : item response theory, one parameter logistic model, parameter estimates, the Prox method.
- Research Article
10
- 10.1016/s2007-5057(14)72724-3
- Jan 1, 2014
- Investigación en Educación Médica
Virtudes y limitaciones de la teoría de respuesta al ítem para la evaluación educativa en las ciencias médicas
- Book Chapter
10
- 10.1007/978-981-10-3302-5_5
- Jan 1, 2016
Classical Test Theory (CTT), also known as the true score theory, refers to the analysis of test results based on test scores. The statistics produced under CTT include measures of item difficulty, item discrimination, measurement error and test reliability. The term “Classical” is used in contrast to “Modern” test theory which usually refers to item response theory (IRT). The fact that CTT was developed before IRT does not mean that CTT is outdated or replaced by IRT. Both CTT and IRT provide useful statistics to help us analyse test data. Generally, CTT and IRT provide complementary results. For many item analyses, CTT may be sufficient to provide the information we need. There are, however, theoretical differences between CTT and IRT, and many researchers prefer IRT because of enhanced measurement properties under IRT. IRT also provides a framework that facilitates test equating, computer adaptive testing and test score interpretation. While this book devotes a large part to IRT, we stress that CTT is an important part of the methodologies for educational and psychological measurement. In particular, the exposition of the concept of reliability in CTT sets the basis for evaluating measuring instruments. A good understanding of CTT lays the foundations for measurement principles. There are other approaches to measurement such as generalizability theory and structural equation modelling, but these are not the focus of attention in this book.
- Research Article
230
- 10.1111/j.1365-2923.2009.03425.x
- Dec 16, 2009
- Medical Education
A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.
- Research Article
- 10.6145/jme201302
- Mar 1, 2013
Background: Item analysis is used to ensure the validity of a test. The Classic Test Theory (CTT) and the Item Response Theory (IRT) are two main item analysis theories. Objective: This study discussed and compared advantages and disadvantages of CTT and IRT in screening out potential problematic test items. Expert opinion and student feedback were also considered before removal of truly problematic items. The study aimed to develop an item analysis procedure to ensure classroom test validity. Method: Eighty-six sixth-year medical students answered a newly developed authentic medical test composed of 48 multiple-choice questions. For item analysis, this study used CTT and IRT methods for the quantitative analysis, while the expert opinion and student feedback were used for the qualitative ones. Cronbach's Alphas were the coefficients of the internal consistency of the whole test. Results: The Cronbach's Alpha of the responses to all 48 items in the test was 0.55. Using IRT, 4 items were deleted and the alpha increased to 0.57. Using CTT, 24 items were deleted and the alpha increased to 0.70. Using IRT and CTT as well as expert opinion, 21 items were deleted and the alpha increased to 0.71. Conclusions: Both CTT and IRT help to increase the test reliability. Compared to IRT, CTT is more effective at increasing the test reliability. Moreover, expert opinion and student feedback offer valuable suggestions for item selection. Based on CTT, expert opinion and student feedback is a considerable procedure for item selection.
- Research Article
467
- 10.1016/j.clinthera.2014.04.006
- May 1, 2014
- Clinical Therapeutics
Overview of Classical Test Theory and Item Response Theory for the Quantitative Assessment of Items in Developing Patient-Reported Outcomes Measures
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.