Knowledge on Indoor Air Quality (K-IAQ): Development and Evaluation of a Questionnaire Through the Application of Item Response Theory
Indoor air pollution is a major cause of noncommunicable diseases, and increasing people’s knowledge about the related risks is a key action for prevention. Many studies describe questionnaires for evaluating knowledge on indoor air quality that often involve selected population groups and take time to fill out. This study describes the validation of a questionnaire built “ad hoc” that aims to be easy to fill out, reliable, and valid. The validation process integrated two psychometric approaches: the Classical Test Theory (CTT), which uses the Kuder–Richardson 20 (KR-20) formula to measure the internal consistency and reliability of the questionnaire as a whole, and the Item Response Theory (IRT), which evaluates each statement (item)’s validity. The questionnaire, distributed using social media to a self-selected sample of people, reached a sample of 621 subjects. In terms of internal consistency, the questionnaire was found to be satisfactory, with a KR-20 value of 0.74 (CI 0.71–0.77). The IRT analysis showed that the statements included in the questionnaire can distinguish between high-performing and low-performing interviewees, since 100% of the items reached a value of the “discrimination parameter aj” that was within or above the recommended range. In terms of difficulty, many statements (53.3%) showed a low level of difficulty, obtaining a low “difficulty parameter bj” value, while another 20% of the items showed a high level of difficulty. Regarding the pseudo-guessing parameter, known as the c-parameter, the probability of answering correctly for a low-performing interviewee was observed in three items (1, 6, and 9), and the same statements fell outside the range for all three parameters evaluated in the IRT. The application of the IRT highlights the criticality of some questions that would not have emerged using the CTT approach alone. Although the questionnaire is acceptable overall, it will be appropriate to evaluate whether to revise or exclude the critical questions in order to improve the instrument’s performance.
55
- 10.1038/d41586-023-00287-8
- Feb 8, 2023
- Nature
5
- 10.24331/ijere.452555
- Oct 1, 2018
- International Journal of Educational Research Review
27
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
6612
- 10.1016/s0140-6736(20)30752-2
- Oct 1, 2020
- Lancet (London, England)
- 10.1111/ina.13155
- Nov 1, 2022
- Indoor Air
1621
- 10.1007/bf02288391
- Sep 1, 1937
- Psychometrika
108
- 10.1177/2055207619867223
- Jan 1, 2019
- Digital Health
92
- 10.1093/oso/9780198527695.001.0001
- Mar 10, 2005
2
- 10.3390/ijerph19137907
- Jun 28, 2022
- International Journal of Environmental Research and Public Health
3
- 10.3390/toxics11030264
- Mar 13, 2023
- Toxics
- Research Article
- 10.21009/jsa.09104
- Jun 30, 2025
- Jurnal Statistika dan Aplikasinya
Item Response Theory (IRT) is an approach that can be used to analyze the responses/answers given by respondents to a measurement instrument. Unlike the classical test theory (CTT) approach that measures the latent traits of respondents based on the total score, IRT measures latent traits based on the responses given by respondents to each item. Another difference between CTT and IRT is that the CTT approach is theory-based while IRT is model-based. The purpose of this study is to apply Item Response Theory (IRT) to analyze the item responses of the employees of the Kementerian Desa, Pembangunan Daerah Tertinggal, dan Transmigrasi/KDPDTTT (Ministry of Village, Development of Disadvantaged Regions and Transmigration) on the items in the instrument/questionnaire which was administered to the employees, in order to understand their attitudes towards the changes management and organizational culture in the KDPDTT. We applied item response theory to analyze the answers provided by the respondents to the items. These responses were modelled based on the dichotomous IRT models, namely the 1PL, 2PL, and 3PL models. The IRT modeling in this study is based on the results of a survey conducted by KDPDTTT in 2020. Among the three models, the 2PL model is the most suitable for our item responses data because it has the smallest AIC, BIC, and G2. Based on the 2PL model, the probability for endorsing the items related to the change management ranges from 0.68 to 0.95. Meanwhile, the probability for endorsing items related to organizational culture ranges from 0.87 to 0.98. Although each item in the instrument has three response options, namely "disagree", "undecided (neutral)", and "agree", we will treat them as dichotomous. We classify the "undecided" answer as the "disagree" category. The reason is that many Indonesian people usually find it hard to say "disagree" for a question related to the evaluation of a policy. They tend to feel safer by choosing “undecided”. Therefore, the item responses that have been analyzed in our study are dichotomous, that is, "agree" or "disagree". The novelty of this research is utilizing a non-classical approach, namely IRT, which has several advantages over Classical Test Theory (CTT), including that item characteristics do not depend on respondent characteristics, and vice versa.
- Research Article
29
- 10.1002/sim.4153
- Dec 28, 2010
- Statistics in Medicine
Health sciences frequently deal with Patient Reported Outcomes (PRO) data for the evaluation of concepts, in particular health-related quality of life, which cannot be directly measured and are often called latent variables. Two approaches are commonly used for the analysis of such data: Classical Test Theory (CTT) and Item Response Theory (IRT). Longitudinal data are often collected to analyze the evolution of an outcome over time. The most adequate strategy to analyze longitudinal latent variables, which can be either based on CTT or IRT models, remains to be identified. This strategy must take into account the latent characteristic of what PROs are intended to measure as well as the specificity of longitudinal designs. A simple and widely used IRT model is the Rasch model. The purpose of our study was to compare CTT and Rasch-based approaches to analyze longitudinal PRO data regarding type I error, power, and time effect estimation bias. Four methods were compared: the Score and Mixed models (SM) method based on the CTT approach, the Rasch and Mixed models (RM), the Plausible Values (PV), and the Longitudinal Rasch model (LRM) methods all based on the Rasch model. All methods have shown comparable results in terms of type I error, all close to 5 per cent. LRM and SM methods presented comparable power and unbiased time effect estimations, whereas RM and PV methods showed low power and biased time effect estimations. This suggests that RM and PV methods should be avoided to analyze longitudinal latent variables.
- Research Article
- 10.46606/eajess2020v01i03.0045
- Dec 26, 2020
- EAST AFRICAN JOURNAL OF EDUCATION AND SOCIAL SCIENCES
This study sought to establish the effect of knowledge of the Classical Test Theory (CTT) and Item Response Theory (IRT) and school assessment environment on assessment practice among teachers of science and mathematics subjects in Eastern Uganda Secondary Schools. The study assessed the levels of knowledge and application of CTT and IRT in assessment, examined the suitability of school environment for assessment and established the influence of school environment and knowledge of CTT and IRT on teachers’ engagement in assessment. A census of 307 teachers of science and mathematics subjects attending SESEMAT training in Eastern Uganda participated in the study. The results revealed that the teachers were engaged in assessment (M = 17.04, SD = 2.00) and had moderate levels of knowledge of CTT (M = 10.19, SD = 2.23) and IRT (M = 17.5, SD = 3.50). Their levels of application of CTT (M = 28.08, SD = 3.85) and IRT (M = 6.86, SD = 1.47) were also moderate. The teachers reported that their schools had somewhat conducive environments for assessment (M = 14.37, SD = 3.44). In addition, school environment affected teachers assessment practices most ( = .211, t = 7.212, p < .05), knowledge of CTT also influenced teachers assessment practice, but less than the influence by environment ( = .112, t = 4.969, p < .05). In conclusion, enhancing the levels of knowledge and application of CTT and IRT as well as improving school assessment environment are paramount for meaningful engagement in assessment by teachers. The study recommended pre-service and in-service training of the teachers in CTT and IRT in addition to schools improving environments for effective teacher engagement and quality assessment.
- Research Article
10
- 10.1080/0142159x.2022.2077716
- Jun 1, 2022
- Medical Teacher
Background Validation of examinations is usually based on classical test theory. In this study, we analysed a key feature examination according to item response theory and compared the results with those of a classical test theory approach. Methods Over the course of five years, 805 fourth-year undergraduate students took a key feature examination on general medicine consisting of 30 items. Analyses were run according to a classical test theory approach as well as using item response theory. Classical test theory analyses are reported as item difficulty, discriminatory power, and Cronbach’s alpha while item response theory analyses are presented as item characteristics curves, item information curves and a test information function. Results According to classical test theory findings, the examination was labelled as easy. Analyses according to item response theory more specifically indicated that the examination was most suited to identify struggling students. Furthermore, the analysis allowed for adapting the examination to specific ability ranges by removing items, as well as comparing multiple samples with varying ability ranges. Conclusions Item response theory analyses revealed results not yielded by classical test theory. Thus, both approaches should be routinely combined to increase the information yield of examination data.
- Research Article
59
- 10.1016/j.im.2016.06.005
- Jun 7, 2016
- Information & Management
Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory
- Front Matter
27
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Research Article
- 10.3352/jeehp.2005.2.23
- Jun 30, 2005
- Journal of Educational Evaluation for Health Professions
To test the applicability of item response theory (IRT) to the Korean Nursesâ Licensing Examination (KNLE), item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7%) were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible. Keywords: Item Response Theory; Classical Test Theory; Unidimensionality; Goodness-of-fit; Rasch Model
- Research Article
- 10.6145/jme201302
- Mar 1, 2013
Background: Item analysis is used to ensure the validity of a test. The Classic Test Theory (CTT) and the Item Response Theory (IRT) are two main item analysis theories. Objective: This study discussed and compared advantages and disadvantages of CTT and IRT in screening out potential problematic test items. Expert opinion and student feedback were also considered before removal of truly problematic items. The study aimed to develop an item analysis procedure to ensure classroom test validity. Method: Eighty-six sixth-year medical students answered a newly developed authentic medical test composed of 48 multiple-choice questions. For item analysis, this study used CTT and IRT methods for the quantitative analysis, while the expert opinion and student feedback were used for the qualitative ones. Cronbach's Alphas were the coefficients of the internal consistency of the whole test. Results: The Cronbach's Alpha of the responses to all 48 items in the test was 0.55. Using IRT, 4 items were deleted and the alpha increased to 0.57. Using CTT, 24 items were deleted and the alpha increased to 0.70. Using IRT and CTT as well as expert opinion, 21 items were deleted and the alpha increased to 0.71. Conclusions: Both CTT and IRT help to increase the test reliability. Compared to IRT, CTT is more effective at increasing the test reliability. Moreover, expert opinion and student feedback offer valuable suggestions for item selection. Based on CTT, expert opinion and student feedback is a considerable procedure for item selection.
- Research Article
- 10.48175/ijarsct-13119
- Oct 11, 2023
- International Journal of Advanced Research in Science, Communication and Technology
In the educational and psychological testing, there are two major theories through which tests can be developed, validated and ultimately used for assessing examinee’s performance. These are classical test theory (CTT) and item response theory (IRT) and their corresponding models. Item Response Theory (IRT) as a test theory came into existence to provide probabilistic approach to surmount some of the inherent limitations of the classical test theory and maximize objectivity in educational assessment. Item response theory (IRT) is a quantitative approach to testing the reliability and validity of an instrument based on its items. It has statistics for evaluating individual items from a quantitative perspective. The purpose of this paper is to describe in details, the application of item response theory in test item development and analysis. The reason for the application of IRT is to have test items that will yield a reasonable degree of reliability. The statistics used in this respect are – item difficulty parameter, which is a measure of the proportion of testees who responded to an item correctly; the item discrimination parameter, which is the measure of how well the items discriminate between examinees with high and low levels of knowledge or ability and pseudo-guessing parameter, which expresses the probability that an examinee with low ability can be able to get an item correctly. The acceptable range of values of the aforementioned parameters were also discussed.
- Research Article
4
- 10.3352/jeehp.2005.2.1.23
- Jun 30, 2005
- Journal of Educational Evaluation for Health Professions
To test the applicability of item response theory (IRT) to the Korean Nurses' Licensing Examination (KNLE), item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7%) were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.
- Research Article
- 10.1016/j.actpsy.2025.105280
- Aug 1, 2025
- Acta psychologica
Psychometric evaluation of the behavioral inhibition/activation system scales in older adults in Mainland China: A classical test theory and item response theory approach.
- Research Article
3
- 10.1177/0033294119884011
- Oct 23, 2019
- Psychological Reports
Classical test theory does not have a clear superiority over the item response theory (IRT) framework, as these approaches are meant to address different kinds of objectives. However, the use of the IRT framework makes it possible to take into account two different parameters in the assessment of coping willingness: the extent to which individuals declare that they use the different strategies and the level of difficulty of these strategies. Also, the IRT framework is strong enough to cope with inconsistent behaviors or missing data and can take into account the social, legal, and cultural influences on the ability to cope of respondents. The data set used in this study was obtained from different areas at risk from coastal flooding located in France. The sample was composed of 315 adult participants (mean age = 47; standard deviation = 15). In the present case, it appeared that just 10 items from an initial pool of 23 were sufficient to assess active and passive coping willingness because these had a good discriminatory power. Also, it appeared that the estimation of participants' level of coping willingness was linked to their risk perception and anxiety toward the risk. This result has several implications. Firstly, if the IRT calibration is well performed, IRT can be used to compare scores across assessments with different properties and difficulties/locations. Also, the maximum likelihood estimate of participants' level of active and passive coping willingness using an IRT model makes it possible to study the links between coping willingness and other factors of interest.
- Research Article
1
- 10.18421/tem111-26
- Feb 28, 2022
- TEM Journal
A quiz is one of the methods to test students’ abilities. Estimation of ability scores can be calculated by using the Classical Test Theory (CTT) and Item Response Theory (IRT) approaches. IRT is more sensitive to item characteristics that can estimate students’ ability. This study implements IRT in a quiz assessment system, compares the results of the CTT and IRT, then analyses the functionality and usability of the built system. The system was tested by 50 respondents using 30 question items with 5 answer choices. The collected data is automatically assessed by the system using CTT and IRT calculations. The results showed differences in ranking between CTT and IRT methods. Using the IRT method, there were 14 respondents who experienced an increase in rank, 19 respondents experienced a decrease in rank, and the remaining 17 respondents did not experience a change in ranking. The functionality of built system was tested with the Blackbox Testing method and it has passed all the test functions. The usability of the system was also tested by involving 23 respondents using the System Usability Scale (SUS) questionnaire and the SUS score showed that the system can be accepted by users.
- Research Article
63
- 10.1037/pas0000597
- Dec 1, 2019
- Psychological Assessment
Item response theory (IRT) is moving to the forefront of methodologies used to develop, evaluate, and score clinical measures. Funding agencies and test developers are routinely supporting IRT work, and the theory has become closely tied to technological advances within the field. As a result, familiarity with IRT has grown increasingly relevant to mental health research and practice. But to what end? This article reviews advances in applications of IRT to clinical measurement in an effort to identify tangible improvements that can be attributed to the methodology. Although IRT shares similarities with classical test theory and factor analysis, the approach has certain practical benefits, but also limitations, when applied to measurement challenges. Major opportunities include the use of computerized adaptive tests to prevent conditional measurement error, multidimensional models to prevent misinterpretation of scores, and analyses of differential item functioning to prevent bias. Whereas these methods and technologies were once only discussed as future possibilities, they are now accessible because of recent support of IRT-focused clinical research. Despite this, much work still remains in widely disseminating methods and technologies from IRT into mental health research and practice. Clinicians have been reluctant to fully embrace the approach, especially in terms or prospective test development and adaptive item administration. Widespread use of IRT technologies will require continued cooperation among psychometricians, clinicians, and other stakeholders. There are also many opportunities to expand the methodology, especially with respect to integrating modern measurement theory with models from personality and cognitive psychology as well as neuroscience. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
- Book Chapter
9
- 10.1007/978-981-10-3302-5_5
- Jan 1, 2016
Classical Test Theory (CTT), also known as the true score theory, refers to the analysis of test results based on test scores. The statistics produced under CTT include measures of item difficulty, item discrimination, measurement error and test reliability. The term “Classical” is used in contrast to “Modern” test theory which usually refers to item response theory (IRT). The fact that CTT was developed before IRT does not mean that CTT is outdated or replaced by IRT. Both CTT and IRT provide useful statistics to help us analyse test data. Generally, CTT and IRT provide complementary results. For many item analyses, CTT may be sufficient to provide the information we need. There are, however, theoretical differences between CTT and IRT, and many researchers prefer IRT because of enhanced measurement properties under IRT. IRT also provides a framework that facilitates test equating, computer adaptive testing and test score interpretation. While this book devotes a large part to IRT, we stress that CTT is an important part of the methodologies for educational and psychological measurement. In particular, the exposition of the concept of reliability in CTT sets the basis for evaluating measuring instruments. A good understanding of CTT lays the foundations for measurement principles. There are other approaches to measurement such as generalizability theory and structural equation modelling, but these are not the focus of attention in this book.
- New
- Research Article
- 10.3390/atmos16111265
- Nov 6, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111263
- Nov 5, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111264
- Nov 5, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111262
- Nov 4, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111260
- Nov 3, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111261
- Nov 3, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111259
- Nov 2, 2025
- Atmosphere
- New
- Research Article
- 10.3390/atmos16111258
- Nov 2, 2025
- Atmosphere
- Research Article
- 10.3390/atmos16111256
- Oct 31, 2025
- Atmosphere
- Research Article
- 10.3390/atmos16111255
- Oct 31, 2025
- Atmosphere
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.