Authentic Mathematics Assessment Using an Integrated Deep Learning and Adiwiyata Testlet Model for Elementary Schools and Madrasah Ibtidaiyah
This study aimed to develop an authentic mathematics assessment for elementary schools in the form of a testlet-based instrument integrated with deep learning principles and the Adiwiyata context within the framework of transformative Islamic education in the Special Region of Yogyakarta. Employing basic research with an embedded mixed-methods design, the development process adapted the Plomp model and the instrument development framework of Oreondo and Antonio. Data were collected through teacher needs surveys, expert validation, readability testing, and field trials involving 462 fifth-grade students from elementary schools and Islamic elementary schools. Quantitative analyses included Content Validity Index (CVI), Aiken’s V, Cronbach’s Alpha, Classical Test Theory (CTT), and Item Response Theory (IRT) using the 2-PL Graded Response Model. The results indicate that the developed instruments meet acceptable psychometric standards, with Aiken’s V values ranging from 0.75 to 1.00 and high internal consistency for both the testlet instrument (Cronbach’s Alpha = 0.845) and the environmental awareness questionnaire (Cronbach’s Alpha = 0.850). Item analysis shows adequate discrimination and a structured progression of difficulty, although one item exhibited low discrimination in the 2-PL GRM, highlighting the importance of IRT-based diagnostics for testlet refinement. Descriptive findings reveal that students demonstrate high levels of environmental awareness, particularly in the knowledge and attitude dimensions, while mathematical achievement remains low on non-routine items. Correlation analysis shows no significant relationship between environmental awareness and mathematical ability. Methodologically, this study contributes a validated and contextually grounded assessment framework that integrates expert judgment, reliability analysis, and complementary CTT–IRT procedures. Theoretically, the findings reconceptualize authentic assessment as a diagnostic bridge rather than a direct causal link between affective values and cognitive performance, demonstrating that environmental concern functions as a potential cognitive resource only when explicitly activated within mathematical tasks.
- Research Article
1
- 10.24235/eduma.v2i2.43
- Nov 7, 2013
- Eduma : Mathematics Education Learning and Teaching
Critical mathematic thinking ability is very important to solve daily problems. But in reality, junior high school students’ critical mathematic thinking ability is still low. Ability measurement such as measurement of critical mathematic thinking ability cannot be measured through multiple choices test. In that case, an essay test in which graded scoring is used as scoring technique more suitable than multiple choices test. The result of the essay test will be analyzed to describe the already tested ability. There are two approaches in the measurement analysis; classical test theory and item response theory (IRT). The classical test theory has some weaknesses because it only depends on how many the right answers student could achieved. Meanwhile, the IRT technique is more suitable to analyze ability because lies on the pattern of the response and parameter of item test. Graded response models (GRM) is one of the IRT models that analyzed graded response. The purposes of this research are to know about the result of the item parameter estimation of the test which has been developed by the researcher and to know the result of student’s critical mathematic thinking ability parameter estimation through GRM (Graded Response Models). The research is a descriptive quantitative research. The population of this research are 8th grade students of MTs Al-Ishlah Bobos and of SMP N 1 Dukupuntang in the academic year of 2012/2013. Applying purposive sampling method this research took 140 students as a sample, from whom 70 students from MTs Al-Ishlah Bobos and 70 students from SMP N 1 Dukupuntang. Measurement theory used in this research is Item Response Theory (IRT) with the GRM model and the instrument used to collect data is critical mathematic thinking ability test paper. The result of the item parameter estimation shows that in terms of the item discrimination all four items tested are less good, meanwhile in the terms of item difficulty the results vary. The first item of the test is considered to be easy, the second and the third item of the test is considered to be very difficult, and the last item of the test is considered just difficult. The result of the critical mathematic thinking ability parameter estimation shows that 4,2% of students have very high critical mathematic thinking ability, 16,4% have high critical mathematic thinking ability, 65,7% have mean critical mathematic thinking ability, 13,5% have low critical mathematic thinking ability and there is no single student with very low critical mathematic thinking ability. Key words : critical mathematic thinking, item of the test parameter, ability parameter, IRT, GRM
- Research Article
- 10.32806/jf.v6i2.3101
- Dec 28, 2017
- FIKROTUNA
Islami Elementary School is combination Islamic education institution that was born from non formal education in Islamic boarding school formed as formal education of madrasah with elementary school as the continuing of Sekolah Rakyat (SR). This school is an elementary school formed by Dutch colonialist for the local students in Indonesia as the basic need of education. Sekolah Rakyat for the next transformation become elementary school created to make a combination the heritage of colonial education with the heritage of Indonesian Islamic education so that it is called Islamic Elementary School that still drive formal education mission and madrasah generally. Islamic Elementary School is a respond of muslim that shown by doing some innovation and creation of development of education institution in a complex society. Islamic elementary school not always using Islamic term, some of them stil use elementary school but the subject elaborate between elementary school and Madrasah Ibtidaiyah, so the content of this school become an Islamic Elementary School. It means the combination elementlary school with Madrasah Ibtidaiyah like Elementary Full Day School al-Baitul Amien Jember as the object focus of this paper.
- Research Article
- 10.1002/alz.057664
- Dec 1, 2021
- Alzheimer's & Dementia
BackgroundThe Amsterdam IADL Questionnaire (A‐IADL‐Q) is increasingly being used in Alzheimer disease (AD) trials to assess activities of daily living. The authors of this scale (Sikkes et al., 2012) originally proposed an Item Response Theory (IRT) based scoring. While such an approach has some potential advantages, it requires a complex model, assumptions about population score distributions, and can be difficult to interpret clinically. An alternative to IRT scoring is the Classical Test Theory (CTT) method, which is relatively simple and easier to interpret. The present study compared IRT and CTT for scoring A‐IADL as a function of global cognition in healthy older adults and patients with mild‐to‐moderate AD.MethodAggregated data from three multinational clinical trials including subjects at the preclinical, early symptomatic, and moderate dementia phase of AD were analyzed. A‐IADL and CDR assessments at initial visits were evaluated and participants were classified into three groups based on CDR Global Scores: 0, 0.5, and 1. For IRT scoring, Graded Response Model (GRM) was used to estimate individual’s latent score. For CTT method, scaled average of scored responses were calculated. ROC analysis was conducted to evaluate the utility of each method in distinguishing between CDR groups.ResultThere were a total of 2,694 A‐IADL assessments across the three CDR groups. There was a very high correlation between CTT and IRT estimates of A‐IADL total score (r = 0.996, p < 0.05). As CDR Global Score increased, A‐IADL scores declined for both methods. In IRT scoring, the Item Characteristics Curve showed an overlap for most item responses, and the test information curve was narrow with a peak to the right of theta 0. Importantly, the ROC curve showed slightly better performance for CTT [AUC: 0.829 (0.812‐0.847)] compared to IRT [AUC: 0.809 (0.788‐0.830)].ConclusionThe current study found that CTT scoring of A‐IADL performed slightly better in distinguishing AD populations as a function of global cognition/function. Given the procedural complexities and required assumptions for IRT scoring, the CTT method, which has the advantages of being more straightforward and familiar to clinicians, represents a better choice for most applications of this instrument.
- Front Matter
28
- 10.1016/s1551-7144(09)00212-2
- Jan 1, 2010
- Contemporary Clinical Trials
Classical and modern measurement theories, patient reports, and clinical outcomes
- Research Article
6
- 10.12738/estp.2017.2.0246
- Jan 1, 2017
- Educational Sciences: Theory & Practice
Tests used for such purposes as determining educational quality, defining educational needs, hiring an employee, student selection and placement and performing guidance and clinic services have an important place in education and psychology. Of course, they should have certain psychometric features related to test scores' validity and reliability. Various test theories have helped to create more valid and reliable measurements and, as a result, to make better decisions regarding individuals. In education and psychology, Classical Test Theory (CTT) and Item Response Theory (IRT) are both widely used. CTT assumes that an individual's observed score is the total of the true score and the error score, while IRT estimates an individual's ability or latent trait from responses to test items (Embretson & Reise, 2000).When IRT assumptions and model-data fit are ensured, item and ability parameters' invariance occurs; this is known as the most important advantage IRT has over CTT. Item and ability parameters' invariance means estimating ability parameters independently of item sample and estimating item parameters independently of ability sample. IRT's invariance feature makes it very practicable in many applications, for instance, test development, computerized adaptive testing, bias studies, test equating and item mapping (Hambleton & Swaminathan, 1985). IRT is classified under two main categories as parametric IRT (PIRT) and nonparametric IRT (NIRT) (Olivares, 2005; Sijtsma & Molenaar, 2002).To analyse ordered items, such as Likert-type attitude items, partial credit cognitive items or not ordered graded items such as multiple-choice test items, item response models are developed towards polytomous items in IRT (Ostini & Nering, 2006). In these models developed for polytomous items, a non-linear relationship between an individual's latent trait and the possibility of choosing a certain category of item answer is explained (Embretson & Reise, 2000). Graded Response Model (GRM), part of IRT models developed for polytomous items, is often preferred by researchers for applications since it is more useful in presentations, portfolios, essays and Likert-type items with ordered item categories (DeMars, 2010; Ostini & Nering, 2006). To scale tests that consist of polytomous items by making true estimates according to GRM, evaluating PIRT's assumptions and model-data fit is necessary. And to provide these assumptions and model-data fit, large samples are needed. At this point, NIRT models draw attention because they provide a practical advantage in determining psychometric properties of tests with fewer items and respondents (Stout, 2001).NIRT models are defined as statistical scaling methods that require fewer assumptions than PIRT models for measuring persons and items (Stochl, 2007). With their wide application area, NIRT models are used in ordinal scales, applied research areas, sociology, marketing research and health research on quality of life (Sijtsma, 2005). The literature reveals that two models, namely, the Mokken model and nonparametric regression estimation models, are employed. These two models are themselves divided into sub-models. The Mokken model consists of the sub-models Monotone Homogeneity Model (MHM) and the Double Monotonicity Model (DMM). Nonparametric regression estimation models consist of such sub-models as the Kernel Smoothing Approach Model (KSAM), the Isotonic Regression Estimation and the Smoothed Isotonic Regression Estimation models (Lee, 2007; Sijtsma & Molenaar, 2002). Along with theoretical studies being conducted, new sub-models are being added to nonparametric regression estimation models.As a NIRT model, MHM requires unidimensionality, local independence and monotonicity assumptions, and it defines the relationship that latent variables and items with homogeneous (unidimensional) and monotone item characteristic curve (ICC) have (Meijer & Baneke, 2004; Sijtsma & Molenaar, 2002). …
- Research Article
- 10.1080/13678868.2025.2558568
- Sep 16, 2025
- Human Resource Development International
This method paper provides basic information about Item Response Theory (IRT). Item Response Theory is not a theory in the traditional sense. It is a psychometric theory that mathematically relates a respondent’s latent trait or ability score with the probability of him or her choosing a certain answer on an instrument’s item. IRT is a fairly new method of measurement construction relative to the Classical Test Theory (CTT). In this paper we briefly discuss CTT and its limitations. Then, we introduce the foundational concepts of IRT. Our focus in this article is the Graded Response Model (GRM). GRM is one of the IRT models that are used for polytomous items such as Likert scale. Using simulated data, we demonstrate the use of GRM on Likert scale data. In addition, our paper emphasises the importance of measurement invariance for International Human Resource Development. We explain and demonstrate Measurement Invariance analysis using IRT. We close our article by discussing the benefits of using IRT and conducting measurement invariance in international Human Resource Development.
- Research Article
1
- 10.5530/jyp.2019.11.39
- May 1, 2019
- Journal of Young Pharmacists
: Objective: To develop and validate Patients’ Knowledge, Attitudes and Practices Instrument for Uncomplicated Malaria (PKAPIUM) through Classical Test Theory (CTT) complemented by Items Response Theory (IRT). Methods: A draft 31-items’ scale was developed using relevant variables from literature and initially screened by six experts before it was used to collect data from 300 patients receiving treatment for uncomplicated malaria in Primary Health Care (PHC) facilities in Plateau state, Nigeria. An orchestrated classical and modern psychometric approach including CTT and IRT was then used to validate the draft instrument through IBM® Statistical Package for Social Sciences (SPSS®) version 23 and Analysis of Moment Structures (AMOS™) software version 22 and Bond and Fox software®, respectively. Results: The 31-items’ draft scale showed good Item’s Content Validity Index (I-CVI) (> 0.8) with good Universal Agreement (UA) level of Scale Content Validity Index (S-CVI/UA) (0.9 – 1) and average CVI (S-CVI/Ave) (0.98 – 1). The CTT and Rasch analyses resulted in retention of twenty one items distributed under Knowledge, Attitude and Practice (KAP) constructs, with Average Variance Extracted (AVE), square root AVE, chi-square, Standardized Root Mean square Residual (SRMR), Root Mean Square Error Approximation (RMSEA), items’ infit Mean Square (MNSQ), Infit Standardized Z-scores (infit Zstds), Point-Measure Correlation Coefficients (PTMEA Corr), Cronbach’s alpha, items’ and person’s reliability indices within accepted limits. Conclusion: The new scale was considered valid and reliable for assessing patients’ knowledge, attitudes and practices on uncomplicated malaria.Key words: Cronbach’s alpha, Factor analysis, Infit and outfit indices, Persons and items reliability, Point-measure correlation coefficients, Standardized Zscore
- Research Article
28
- 10.1037/a0036430
- Jan 1, 2014
- Psychological Assessment
[Correction Notice: An Erratum for this article was reported in Vol 26(3) of Psychological Assessment (see record 2014-16017-001). The mean, standard deviation and alpha coefficient originally reported in Table 1 should be 74.317, 10.214 and .802, respectively. The validity coefficients in the last column of Table 4 are affected as well. Correcting this error did not change the substantive interpretations of the results, but did increase the mean, standard deviation, alpha coefficient, and validity coefficients reported for the Honesty subscale in the text and in Tables 1 and 4. The corrected versions of Tables 1 and Table 4 are shown in the erratum.] Item response theory (IRT) models were applied to dichotomous and polytomous scoring of the Self-Deceptive Enhancement and Impression Management subscales of the Balanced Inventory of Desirable Responding (Paulhus, 1991, 1999). Two dichotomous scoring methods reflecting exaggerated endorsement and exaggerated denial of socially desirable behaviors were examined. The 1- and 2-parameter logistic models (1PLM, 2PLM, respectively) were applied to dichotomous responses, and the partial credit model (PCM) and graded response model (GRM) were applied to polytomous responses. For both subscales, the 2PLM fit dichotomous responses better than did the 1PLM, and the GRM fit polytomous responses better than did the PCM. Polytomous GRM and raw scores for both subscales yielded higher test-retest and convergent validity coefficients than did PCM, 1PLM, 2PLM, and dichotomous raw scores. Information plots showed that the GRM provided consistently high measurement precision that was superior to that of all other IRT models over the full range of both construct continuums. Dichotomous scores reflecting exaggerated endorsement of socially desirable behaviors provided noticeably weak precision at low levels of the construct continuums, calling into question the use of such scores for detecting instances of "faking bad." Dichotomous models reflecting exaggerated denial of the same behaviors yielded much better precision at low levels of the constructs, but it was still less precision than that of the GRM. These results support polytomous over dichotomous scoring in general, alternative dichotomous scoring for detecting faking bad, and extension of GRM scoring to situations in which IRT offers additional practical advantages over classical test theory (adaptive testing, equating, linking, scaling, detecting differential item functioning, and so forth).
- Research Article
- 10.1093/eurpub/ckab164.646
- Oct 20, 2021
- European Journal of Public Health
Background The proportion of frail older adults is increasing and is expected to further increase in the coming years, both globally and in the Dutch population. This poses a great challenge to public health. To determine the prevalence of frailty in a population, a frailty index (FI) is recommended. A FI is an accumulation model encompassing health deficits in multiple domains. Previous research has shown that a FI can be created out of existing health surveys, since it is a flexible instrument, fairly insensitive to the use of specific items. However, this is based on scale development using Classical Test Theory, while few studies have investigated the psychometric properties of their FI using Item Response Theory (IRT). The aim of this study was to create a FI using the Dutch Health Monitor 2016, and to investigate its psychometric properties using Item Response Theory (IRT). Methods Forty-two deficits were selected in three health domains, i.e., physical, psychological, and social. Psychometric properties were investigated by using an IRT model for polytomous response categories: the Graded Response Model (GRM). Items were evaluated by Cronbach's Alpha, Factor Analysis, Point Polyserial Correlations, and GRM. Results The analyses showed that all items demonstrated a positive association with the scale. However, five items did not fit well to the FI scale. From the physical domain these were body mass index and three items about adherence to physical activity guidelines: moderate activity per week; bone and muscle strengthening activities; balance exercises. From the psychological domain this was an item about a sense of control over one's own future. Conclusions By using IRT, we showed that while 37 items were adequate and fitted the scale well, five items in our FI were redundant, indicating that it does matter which items are selected for a FI. IRT is a strong method for item selection and thus for creating a more concise Frailty Index. Key messages Creating a solid and more concise Frailty Index with IRT is promising for epidemiological research and public health. For creating a Frailty Index, item selection needs careful consideration.
- Research Article
3
- 10.3389/fpsyg.2023.1035071
- Feb 1, 2023
- Frontiers in psychology
To validate the hepatitis B virus infection-related stigma scale (HBVISS) using Classical Test Theory and Item Response Theory in a sample of Chinese chronic HBV carriers. Feasibility, internal consistency reliability, split-half reliability and construct validity were evaluated using a cross-sectional validation study (n = 1,058) in Classical Test Theory. Content validity was assessed by COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) criteria. The Item Response Theory (IRT) model parameters were estimated using Samejima's graded response model, after which item response category characteristic curves were drawn. Item information, test information, and IRT-based marginal reliability were calculated. Measurement invariance was assessed using differential item functioning (DIF). SPSS and R software were used for the analysis. The response rate reached 96.4% and the scale was completed in an average time of 5 min. Content validity of HBVISS was sufficient (+) and the quality of the evidence was high according to COSMIN criteria. Confirmatory factor analysis showed acceptable goodness-of-fit (χ 2/df = 5.40, standardized root mean square residual = 0.057, root mean square error of approximation = 0.064, goodness-of-fit index = 0.902, comparative fit index = 0.925, incremental fit index = 0.926, and Tucker-Lewis index = 0.912). Cronbach's α fell in the range of 0.79-0.89 for each dimension and 0.93 for the total scale. Split-half reliability was 0.96. IRT discrimination parameters were estimated to range between 0.959 and 2.333, and the threshold parameters were in the range-3.767 to 3.894. The average score for test information was 12.75 (information >10) when the theta level reached between-4 and + 4. The IRT-based marginal reliability was 0.95 for the total scale and fell in the range of 0.83-0.91 for each dimension. No measurement invariance was detected (d-R 2 < 0.02). HBVISS exhibited good feasibility, reliability, validity, and item quality, making it suitable for assessing chronic Hepatitis B virus infection-related stigma.
- Book Chapter
10
- 10.1007/978-981-10-3302-5_5
- Jan 1, 2016
Classical Test Theory (CTT), also known as the true score theory, refers to the analysis of test results based on test scores. The statistics produced under CTT include measures of item difficulty, item discrimination, measurement error and test reliability. The term “Classical” is used in contrast to “Modern” test theory which usually refers to item response theory (IRT). The fact that CTT was developed before IRT does not mean that CTT is outdated or replaced by IRT. Both CTT and IRT provide useful statistics to help us analyse test data. Generally, CTT and IRT provide complementary results. For many item analyses, CTT may be sufficient to provide the information we need. There are, however, theoretical differences between CTT and IRT, and many researchers prefer IRT because of enhanced measurement properties under IRT. IRT also provides a framework that facilitates test equating, computer adaptive testing and test score interpretation. While this book devotes a large part to IRT, we stress that CTT is an important part of the methodologies for educational and psychological measurement. In particular, the exposition of the concept of reliability in CTT sets the basis for evaluating measuring instruments. A good understanding of CTT lays the foundations for measurement principles. There are other approaches to measurement such as generalizability theory and structural equation modelling, but these are not the focus of attention in this book.
- Research Article
- 10.1158/1538-7445.am2013-1367
- Apr 15, 2013
- Cancer Research
BACKGROUND: Patient satisfaction (PS) is an important outcome measure of quality of cancer-related care. PS was one of the four core study outcomes of the National Cancer Institute and American Cancer Society funded $25 million multicenter Patient Navigation Research Program (PNRP) to reduce disparities in cancer care. A Patient Satisfaction with Cancer Care (PSCC) measure was developed and validated for the PNRP using classical test theory and principal components analysis (PCA). OBJECTIVE: To calibrate items of the PSCC to facilitate the development of a computerized adaptive testing (CAT) system, which can be used to tailor the PSCC to patients’ satisfaction level based on properties of the items. METHODS: The PCA revealed a unidimensional PSCC measure. Thus, we applied unidimensional item response theory (IRT) models to the 18-item PSCC data from 1,296 participants (73% female; age 18 to 86 years). We fitted two IRT models to the data: An unconstrained graded response model (GRM) and a constrained GRM (i.e., Rasch Model) where all discrimination parameters across items were fixed to be equal. We obtained model fit indices (log-likelihood, AIC & BIC) and performed model comparison through likelihood ratio (LR) test between the unconstrained GRM and the Rasch model. We obtained item and latent trait (i.e., patient satisfaction) parameter estimates, category characteristic curves, operating characteristic curves, and test information curves for the better fitting model. RESULTS: The unconstrained GRM fitted the data significantly better (LR = 828, df = 17, p &lt; 0.001). Item parameter estimates showed strong items discriminating power (α = 0.94 to 2.18). Standard errors (SE) of the item parameter estimates were also small (i.e., mostly around 0.1 for the threshold parameters, and between 0.1 to 0.2 for the discrimination parameters), confirming the precision of the item parameter estimates obtained. CONCLUSIONS: The PSCC is suitable to be delivered through a CAT system where patients will receive tailored optimally selected items to measure their satisfaction levels, and scores will be equated across different subsets of items (i.e., test forms). An IRT-based PSCC CAT system will provide key patient reported outcome data to help improve patient-centered cancer care and satisfaction for medically underserved populations. Citation Format: Pascal Jean-Pierre, Ying Cheng, Steven Patierno, Peter Raich, Richard Roetzheim, Steven Rosen, Donald Dudley, Karen Freund, Victoria Warren-Mears, Electra Paskett, Kevin Fiscella. Item response theory analysis of the patient satisfaction with cancer-related care: psychometric validation in a multicultural sample of 1,296 participants. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 1367. doi:10.1158/1538-7445.AM2013-1367
- Research Article
- 10.21031/epod.559470
- Sep 4, 2019
- Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi
The aim of the study is to examine the reliability estimations of written expression skills analytical rubric based on the Classical Test Theory (CTT), Generalizability Theory (GT) and Item Response Theory (IRT) which differ in their field of study. In this descriptive study, the stories of the 523 students in the study group were scored by seven raters. CTT results showed that Eta coefficient revealed that there was no difference between the scoring of the raters (η = .926); Cronbach Alpha coefficients were over .88. GT results showed that G and Phi coefficients were over .97. The students’ expected differentiation emerged, the difficulty levels of the criteria did not change from one student to another, and the consistency between the scores among raters was excellent. In the Item Response Theory, parameters were estimated according to Samejima’s (1969) Graded Response Model and item discrimination differed according to the different raters. According to b parameters, for all the raters; individuals are expected to be at least -2.35, -0.80, 0.41 ability level in order to be scored higher than 0, 1 or 2 categories respectively with .50 probability. Marginal reliability coefficients were quite high (around .93). The Fisher Z’ statistic was calculated for the significance of the difference between all reliability estimates. GT revealed more detailed information than CTT in the explanation of error variance sources and determination of reliability; while IRT provided more detailed information than CTT in determining the item-level error estimations and the ability level. There was a significant difference between the estimated parameters of CTT and GT in interrater reliability (p &lt; .05); there was no significant difference between the parameters predicted according to CTT and IRT (p &gt; .05).
- Research Article
9
- 10.1186/s12955-020-01620-9
- Nov 13, 2020
- Health and Quality of Life Outcomes
BackgroundEarly prelingual auditory development (EPLAD) is a fundamental and important process in the speech and language development of infants and toddlers. The Infant–Toddler Meaningful Auditory Integration Scale (ITMAIS) is a widely used measurement tool for EPLAD, however it has not yet undergone a comprehensive psychometric analysis. The aim of this research was to modify and verify the psychometric properties of ITMAIS using a combination of Item Response Theory (IRT) and Classical Test Theory (CTT).MethodsStage 1—1730 children were retrospectively recruited to enable the application of an IRT model, specifically the graded response model, to modify the ITMAIS. Stage 2—another 450 infants and toddlers with normal hearing or permanent hearing loss before auditory intervention were recruited to verify the psychometric properties of the modified ITMAIS (ITMAIS-m) using the CTT method.ResultsUsing the metric of the graded response model, by removing item 2 from the ITMAIS, ITMAIS-m demonstrated discrimination parameters ranging from 3.947 to 5.431, difficulty parameters from − 1.146 to 1.150, item information distributed between 4.798 and 9.259 and a test information score of 48.061. None of the items showed differential item functioning. ITMAIS-m was further verified in Stage 2, showing Cronbach’s α of 0.919 and item-total correlations ranging from 0.693 to 0.851. There was good convergent validity of ITMAIS-m with other auditory outcome measure (r = 0.932) and pure tone average thresholds (r ranging from − 0.670 to − 0.909), as well as a high ability to discriminate between different hearing grades (Cohen d ranging from 0.41 to 5.83).ConclusionsThe ITMAIS-m is a reliable and valid tool for evaluating EPLAD in infants and toddlers, which can be efficiently and precisely applied in clinical practice. The combined use of IRT and CTT provides a powerful means to modify psychometrically robust scales aimed at childhood auditory outcome measurements.
- Research Article
1
- 10.18860/jpai.v11i1.29301
- Nov 18, 2024
- J-PAI: Jurnal Pendidikan Agama Islam
Educational institutions strive to shape students into individuals with character and noble morals, enabling them to integrate intelligence, intellectual, emotional, and spiritual aspects. Research on character education in elementary schools, specifically Madrasah Ibtidaiyah, presents diverse viewpoints. Based on the data above. The focus of this study is on three specific issues. The first issue is the pressing need for character education in both elementary schools and Islamic elementary schools, also known as madrasah ibtidaiyah. The second concern pertains to the various forms of character education students receive. The third issue pertains to the implementation of the character education model in Islamic elementary schools. The researchers employed the systematic literature review method. The articles were chosen based on five stages, and data analysis was done using PRISMA. The themes and time period covered the last ten years, from 2014 to 2024, and global data from Scopus and Google Scholar, which added up to 483 articles, book reviews, proceedings, and conferences. The study's results indicate that schools must effectively and efficiently implement character education in Islamic elementary school students, through coaching, mentoring, and individual approaches. In addition, schools must strengthen educational components such as curriculum, learning media, madrasah programs, and the government through the education office. while the types of characters implemented in Islamic elementary schools, include: religious character, nationalism, mutual cooperation, social, and discipline, which are supported by several character education models such as Thomas Lickona's character education model, the G-Gold Way model, participatory observation learning model, and an education model based on strengthening collaboration between schools and parents. The study's theoretical implications emphasize the importance of fostering a positive school culture as the primary solution.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.