Perth Alexithymia Questionnaire
Abstract: The level of alexithymia affects a person’s mental and physical health. Nevertheless, this trait is still measured using unreliable tools. The Perth Alexithymia Questionnaire (PAQ) is a promising measuring tool, but no study has examined its psychometric characteristics using Item Response Theory (IRT). Thus, we aimed to explore the psychometric features of the PAQ using IRT and Classical Test Theory. A sample of Czech adults ( n = 848, age: M = 34.95, SD = 11.90, females: 81.13%) participated in an online survey. We measured alexithymia, empathy, sensory processing sensitivity (SPS), neuroticism, anxiety, and depression. Confirmatory factor analysis provided evidence of the good fit of a five-factor solution: Reliability of the PAQ was good (omega = 0.70–0.96). Measurement invariance testing revealed that the PAQ measures alexithymia invariantly between people who are single and in partnership. Partial invariance was found in sex. The PAQ items had high discrimination, and their measurement precision was highest in individuals with above-average alexithymia. Higher alexithymia was present in males. Finally, alexithymia was positively associated with depression, anxiety, and neuroticism, and with SPS after neuroticism was taken into account. In conclusion, the PAQ represents a reliable and valid instrument for assessing alexithymia.
- Front Matter
34
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Discussion
7
- 10.1186/s41687-019-0134-1
- Jul 30, 2019
- Journal of Patient-Reported Outcomes
BackgroundPsychometric analyses of patient reported outcomes typically use either classical test theory (CTT), item response theory (IRT), or Rasch measurement theory (RTM). The three papers from the ISOQOL Psychometrics SIG examined the same data set using the tree different approaches. By comparing the results from these papers, the current paper aims to examine the extent to which conclusions about the validity and reliability of a PRO tool depends on the selected psychometric approach.Main textRegarding the basic statistical model, IRT and RTM are relatively similar but differ notably from CTT. However, modern applications of CTT diminish these differences. In analyses of item discrimination, CTT and IRT gave very similar results, while RTM requires equal discrimination and therefore suggested exclusion of items deviating too much from this requirement. Thus, fewer items fitted the Rasch model. In analyses of item thresholds (difficulty), IRT and RMT provided fairly similar results. Item thresholds are typically not evaluated in CTT. Analyses of local dependence showed only moderate agreement between methods, partly due to different thresholds for important local dependence. Analyses of differential item function (DIF) showed good agreement between IRT and RMT. Agreement might be further improved by adjusting the thresholds for important DIF. Analyses of measurement precision across the score range showed high agreement between IRT and RMT methods. CTT assumes constant measurement precision throughout the score range and thus gave different results. Category orderings were examined in RMT analyses by checking for reversed thresholds. However, this approach is controversial within the RMT society. The same issue can be examined by the nominal categories IRT model.ConclusionsWhile there are well-known differences between CTT, IRT and RMT, the comparison between three actual analyses revealed a great deal of agreement between the results from the methods. If the undogmatic attitude of the three current papers is maintained, the field will be well served.
- Book Chapter
14
- 10.1007/978-981-10-3302-5_5
- Jan 1, 2016
Classical Test Theory (CTT), also known as the true score theory, refers to the analysis of test results based on test scores. The statistics produced under CTT include measures of item difficulty, item discrimination, measurement error and test reliability. The term “Classical” is used in contrast to “Modern” test theory which usually refers to item response theory (IRT). The fact that CTT was developed before IRT does not mean that CTT is outdated or replaced by IRT. Both CTT and IRT provide useful statistics to help us analyse test data. Generally, CTT and IRT provide complementary results. For many item analyses, CTT may be sufficient to provide the information we need. There are, however, theoretical differences between CTT and IRT, and many researchers prefer IRT because of enhanced measurement properties under IRT. IRT also provides a framework that facilitates test equating, computer adaptive testing and test score interpretation. While this book devotes a large part to IRT, we stress that CTT is an important part of the methodologies for educational and psychological measurement. In particular, the exposition of the concept of reliability in CTT sets the basis for evaluating measuring instruments. A good understanding of CTT lays the foundations for measurement principles. There are other approaches to measurement such as generalizability theory and structural equation modelling, but these are not the focus of attention in this book.
- Research Article
1
- 10.1177/008124639002000408
- Dec 1, 1990
- South African Journal of Psychology
One of the great advantages of the modern approach to item and test analysis, namely, the Item Response Theory (IRT) over the traditional Classical Test Theory (CTT) is that item statistics such as the difficulty index and the discrimination index are not sample dependent. Very few research findings using IRT models have been published locally. Hoffman is one of the small group of researchers who have implemented IRT models in South Africa. The purpose of the present study was to compare CTT and IRT item and test analysis. Data from the multiple choice part of the 1988 examination paper on research methodology for 538 undergraduate UNISA Industrial Psychology students were used. Results showed that corresponding interpretations of item statistics (CTT) and item parameters (IRT) was possible under the two approaches. More or less the same conclusions could also be drawn with regard to evaluation of test statistics. The near-perfect correlation between test scores (CTT) and ability estimates (IRT) for the one-parameter model indicates that a great similarity exists between them.
- Research Article
4
- 10.1007/s40037-020-00586-0
- May 28, 2020
- Perspectives on Medical Education
IntroductionIn high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score.MethodsWe analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression.ResultsIn CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT.DiscussionWe found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments.
- Research Article
65
- 10.1186/1471-2288-10-24
- Mar 25, 2010
- BMC Medical Research Methodology
BackgroundPatients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared.MethodsTwo-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified.ResultsWhen person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods.ConclusionWithout any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.
- Research Article
- 10.6145/jme201302
- Mar 1, 2013
- 醫學教育
Background: Item analysis is used to ensure the validity of a test. The Classic Test Theory (CTT) and the Item Response Theory (IRT) are two main item analysis theories. Objective: This study discussed and compared advantages and disadvantages of CTT and IRT in screening out potential problematic test items. Expert opinion and student feedback were also considered before removal of truly problematic items. The study aimed to develop an item analysis procedure to ensure classroom test validity. Method: Eighty-six sixth-year medical students answered a newly developed authentic medical test composed of 48 multiple-choice questions. For item analysis, this study used CTT and IRT methods for the quantitative analysis, while the expert opinion and student feedback were used for the qualitative ones. Cronbach's Alphas were the coefficients of the internal consistency of the whole test. Results: The Cronbach's Alpha of the responses to all 48 items in the test was 0.55. Using IRT, 4 items were deleted and the alpha increased to 0.57. Using CTT, 24 items were deleted and the alpha increased to 0.70. Using IRT and CTT as well as expert opinion, 21 items were deleted and the alpha increased to 0.71. Conclusions: Both CTT and IRT help to increase the test reliability. Compared to IRT, CTT is more effective at increasing the test reliability. Moreover, expert opinion and student feedback offer valuable suggestions for item selection. Based on CTT, expert opinion and student feedback is a considerable procedure for item selection.
- Book Chapter
33
- 10.4324/9781315736013.ch15
- Dec 16, 2014
The ultimate goal of measurement is to produce a score by which individuals can be assessed and differentiated. Item response theory (IRT) modeling views responses to test items as indicators of a respondent’s standing on some underlying psychological attributes (van der Linden & Hambleton, 1997) – we often call them latent traits – and devises special algorithms for estimating this standing. This chapter gives an overview of methods for estimating person attribute scores using one-dimensional and multi-dimensional IRT models, focusing on those that are particularly useful with patient-reported outcome (PRO) measures. To be useful in applications, a test score has to approximate the latent trait well, and importantly, the precision level must be known in order to produce information for decision-making purposes. Unlike classical test theory (CTT), which assumes the precision with which a test measures the same for all trait levels, IRT methods assess the precision with which a test measures at different trait levels. In the context of patient-reported outcomes measurement, this enables assessment of the measurement precision for an individual patient. Knowing error bands around the patient’s score is important for informing clinical judgments, such as deciding upon significance of any change, for instance in response to treatment etc. (Reise & Haviland, 2005). At the same time, summary indices are often needed to summarize the overall precision of measurement in a research sample, population group, or in the population as a whole. Much of this chapter is devoted to methods for estimating measurement precision, including the score-dependent standard error of measurement and appropriate sample-level or population-level marginal reliability coefficients. Patient-reported outcome measures often capture several related constructs, the feature that may make the use of multi-dimensional IRT models appropriate and beneficial (Gibbons, Immekus & Bock, 2007). Several such models are described, including a model with multiple correlated constructs, a model where multiple constructs are underlain by a general common factor (second-order model), and a model where each item is influenced by one general and one group factor (bifactor model). To make the use of these models more easily accessible for applied researchers, we provide specialized formulae for computing test information, standard errors and reliability. We show how to translate a multitude of numbers and graphs conditioned on several dimensions into easy-to-use indices that can be understood by applied researchers and test users alike. All described methods and techniques are illustrated with a single data analysis example involving a popular PRO measure, the 28-item version of the General Health Questionnaire (GHQ28; Goldberg & Williams, 1988), completed in mid-life by a large community sample as a part of a major UK cohort study.
- Research Article
16
- 10.2147/prbm.s413162
- Jul 1, 2023
- Psychology Research and Behavior Management
In 2021, Hall et al developed the Digital Stress Scale (DSS), but its psychometric characteristics were only tested using classical test theory (CTT). In this study, we use item response theory (IRT) and CTT to develop and verify a Chinese version of the DSS and its short version, which can improve the reliability and effectiveness of the digital stress measurement tool for Chinese college students. In this study, we developed a Chinese version of the DSS (DSS-C) and recruited 1506 Chinese college students as participants to analyze its psychometric characteristics based on CTT and item response theory methods. First, we used CTT, including common method bias, construct validity, criterion-related validity, internal consistency, test-retest reliability and measurement invariance. Then, we adopted the IRT approach to examine the item parameters, item characteristics, item information, differential item function, test information, and test reliability of the DSS-C. Finally, a short form (DSS-C-S) was constructed, and the psychometric characteristics of the DSS-C-S were examined. (1) The five-factor structure of the DSS-C was verified. The DSS-C shows good internal reliability, test-retest reliability, criterion-related validity and measurement invariance between urban college students and rural college students. (2) All 24 items had reasonable discrimination parameters and location parameters and were DIF-free by gender. Except for Items 4 and 10, all the other items had high information and measurement reliability at medium θ levels. (3) Compared with the DSS-C, the 22-item short form also has good reliability and validity and maintains sufficient measurement accuracy while reducing items of poor quality. Both the DSS-C and the DSS-C-S have good psychometric characteristics and are accurate and effective tools for measuring the digital stress of Chinese college students.
- Research Article
13
- 10.1093/swr/34.2.94
- Jun 1, 2010
- Social Work Research
The need to develop measures that tap into constructs of interest to social work, refine existing measures, and ensure that measures function adequately across diverse populations of interest is critical. Item response theory (IRT) is a modern measurement approach that is increasingly seen as an essential tool in a number of allied professions. IRT-based measurement uses a model-based approach that has several analytical and explanatory advantages over classical test theory. In particular, IRT-based techniques facilitate the process of specific item selection, allow for increased measurement precision with fewer items, and provide greater capacity for understanding and accounting for measurement bias across diverse populations. A survey of the top (as rated by impact factor) 20 social work journals revealed that few measurement articles in the social work literature use IRT or other modern measurement approaches. The benefit of incorporating more IRT-based approaches for developing, refining, and ensuring the application of measures to diverse populations is discussed. KEY WORDS: bias; classical test theory; item response theory; measurement; social work ********** The state of measurement within the social work literature is integrally related to knowledge base development and, ultimately, the extent to which research is able to meaningfully inform practice (Holden, Nizza, & Weissman, 1995). Scholarship highlights at least three measurement-related research domains within the field of social work. The first concerns the development of valid and reliable measures that capture the diverse set of phenomena relevant to social work, particularly those phenomena that may not be adequately represented by existent standardized instruments. The second is the assessment and validation of such measures. In particular, high-quality intervention research hinges on the validity and reliability of measures used to assess outcomes (Rosen, Proctor, & Staudt, 1999).Third, a growing body of literature challenges the extent to which well-validated measures adequately account and adjust for within- and across-population sources of diversity (see Ramirez, Ford, Stewart, & Teresi, 2005; Snowden, 2003), and such concerns are highly salient to social work's commitment to diversity-sensitive and -responsive research and practice. During the 1980s and 1990s, social work researchers outlined the relative benefits of item response theory (IRT) over classical test theory (CTT) measurement models, calling explicitly for IRT-based models' increased utilization to address measurement problems in social work research (DeRoos & Allen-Meares, 1993, 1998; Nugent & Hankins, 1989,1992). Indeed, IRT models have largely subsumed CTT approaches within a wide range of allied fields and disciplines (for example, medicine, psychology, nursing, public health, education) (see Dunn, Resnicow, & Klesges, 2006; Embretson & Reise, 2000; Fries, Bruce, & Cella, 2005; Lord, 1980; Ware, Bjorner, & Kosinski, 2000). Given early interest among social work researchers and the recent proliferation of IRT methods within other applied social sciences, our overall objective in the present study was to assess the extent to which these methods are represented within social work research. This review thus realizes three overlapping aims. First, it provides a description and comparison of IRT and CTT models and outlines the potential contributions of IRT methods to social work scholarship; it also briefly discusses IRT more generally as a latent variable model and its overlap with confirmatory factor analytic (CFA) and multi-level modeling methods. Second, it presents the results of a structured review assessing the penetration of IRT-based methods into the field of social work as reflected in key social work research journals. Third, using these results as a launching point, we highlight particular lines of inquiry within social work research where the application of IRT methods would likely yield substantial innovation. …
- Research Article
15
- 10.1038/s41598-024-72657-9
- Sep 30, 2024
- Scientific Reports
Sensory processing sensitivity (SPS), linked to processing external and internal stimuli, has drawn attention to its associations with clinical factors, particularly with health-related quality of life (HRQOL) variables. This study examined the relationships among SPS, stress, sleep quality, and HRQOL, establishing an explanation model. Eight hundred adults (M = 26.66 years, SD = 12.24; range age: 18–85 years) completed self-administered questionnaires on SPS, stress, sleep quality, and HRQOL. Correlation analysis and structural equation modeling (SEM) were used to analyze HRQOL pathways. Stress positively correlated with sleep quality disturbances (r = 0.442, p < 0.001), and SPS (r = 0.344, p < 0.001). Sleep quality disturbances were weakly positively associated with SPS (r = 0.242, p < 0.001). Weak negative correlations emerged between stress and physical (r = -0.283, p < 0.001) and mental (r = − 0.271, p < 0.001) health, HRQOL main dimensions. SEM results showed SPS positively influenced sleep quality disturbances (β = 0.242, p < 0.05) stress (β = 0.413, p < 0.001) while negatively affecting physical health (β = − 0.126, p < 0.001). Sleep quality disturbances negatively affected physical (β = − 0.168, p < 0.001), and mental (β = − 0.189 , p < 0.001) health, and stress on mental health (β = − 0.492, p < 0.01). Indirect effects between SPS and physical (β = -0.036, p < 0.001) and mental (β = − 0.091, p < 0.001) health through sleep were observed, as well as a mediation of stress between SPS and mental health (β = − 0.196, p < 0.001). SPS, sleep quality disturbances, and stress emerged as significant predictors of self-rated physical and mental health in adults.
- Research Article
- 10.1080/09593985.2026.2670424
- May 13, 2026
- Physiotherapy Theory and Practice
Background Item response theory (IRT), particularly the graded response model, provides a latent trait (θ) that reflects balance ability and offers a complementary framework for psychometric evaluation. Although IRT is recommended for estimating clinical thresholds, its application to cutoff determination for stroke populations remains limited. We hypothesized that IRT-based approaches will yield cutoff values that differ from those derived using classical test theory and provide additional methodological insight for cutoff determination. Purpose To estimate and compare cutoff values for walking independence in inpatients with stroke using classical test theory and IRT. Methods Data from 165 inpatients with stroke were analyzed. Balance was assessed using the Brief Balance Evaluation Systems Test (Brief-BESTest). Cutoff scores were calculated using receiver operating characteristic (ROC), IRT combined with ROC (IRT+ROC), prevalence-based IRT method (old IRT), and embedded state-item IRT method (new IRT). Discriminative performance was evaluated using area under the ROC curve (AUC), sensitivity, specificity, likelihood ratios, and accuracy. Results The cutoff values for walking independence were 12.37 (ROC), 11.46 (IRT+ROC), 10.54 (old IRT), and 7.90 (new IRT). All methods demonstrated comparable performance, with AUC values ranging from 0.769 to 0.794, indicating good discriminative ability of the Brief-BESTest for identifying walking independence. The new IRT method achieved the highest AUC and sensitivity; ROC and IRT+ROC methods achieved similar AUCs. Although cutoff values and classification metrics differed slightly across methods, their confidence intervals overlapped. Conclusion To our knowledge, this is the first study to directly compare cutoff estimation methods incorporating IRT in stroke rehabilitation, suggesting that IRT-based approaches can yield cutoff values with stability comparable to that of classical test theory and providing methodological insight for future research and clinical assessment scale development.
- Research Article
1
- 10.22373/pjp.v13i3.25435
- Dec 23, 2024
- PIONIR: JURNAL PENDIDIKAN
This study aims to compare the results of instrument testing methods between applying classical test theory and item response theory using the Rasch model in question instruments on respiratory system material. This study employed descriptive quantitative methodology, with a sample involving 36 students. The analyzed instrument consisted of 40 multiple-choice questions on respiratory system material. Instrument analysis utilized classical test theory with Microsoft Excel and item response theory with Winstep Rasch ver 4.5.2.0. The data analysis from classical test theory and item response theory offers slightly different interpretations but is mutually complementary. Both classical test theory and item response theory may assess the validity, reliability, distractor effectiveness, difficulty level, and discriminating power of questions. Item response theory provides a comprehensive analysis of test results through the use of the Wright map as a bar which helps determine a student's ability about the difficulty level of the question. Scalogram is used to identify patterns in students’ responses, allowing for the detection of cheating and inaccuracies in answering questions. Additionally, DIF items are employed to identify item bias. This study concludes that any developed instrument must possess the characteristics that meet the requirements to measure competency effectively. The requirements for an instrument can be analyzed using item response theory with the Rasch model, which provides in-depth interpretation.Keywords: Classical test theory, item response theory, Rasch model, instrument
- Research Article
13
- 10.1016/j.jad.2020.05.090
- May 27, 2020
- Journal of Affective Disorders
Assessment of Geriatric Depression Scale's Applicability in Longevous Persons based on Classical Test and Item Response Theory
- Book Chapter
7
- 10.1007/978-94-007-4507-0_9
- Jan 1, 2012
Item response theory (IRT) and classical test theory (CTT) are invaluable tools for the construction of assessment instruments and the measurement of student proficiencies in educational settings. However, the advantages of IRT over CTT are not always clear. This chapter uses an example item analysis to contrast IRT and CTT. It is hoped that the readers can gain a deeper understanding of IRT through comparisons of similarities and differences between IRT and CTT statistics. In particular, this chapter discusses item properties such as the difficulty and discrimination power of items, as well as person ability measures contrasting the weighted likelihood estimates and plausible values in non-technical ways. The main advantage of IRT over CTT is outlined through a discussion on the construction of a developmental scale on which individual students are located. Further, some limitations of both IRT and CTT are brought to light to guide the valid use of IRT and CTT results. Lastly, the IRT software program, ConQuest (Wu et al. ACERConQuest version 2: Generalised item response modelling software. Australian Council for Educational Research, Camberwell, 2007), is used to run the item analysis to illustrate some of the program’s functionalities.