AbstractGeneralizability theory (G theory) provides a broad conceptual framework for social sciences such as psychology and education, and a comprehensive construct for numerous measurement events by using analysis of variance, a strong statistical method. G theory, as an extension of both classical test theory and analysis of variance, is a model which can deal with multiple sources of error. In conducting the analysis of the G theory, there are several software programs that can be used such as GENOVA, SPSS, SAS, EduG, and G-String. In this study, the general perspectives of G theory are first explained broadly. Then, the SPSS and EduG software programs are used to conduct generalizability analyses on the data obtained from the answers of 30 students (p) to nine open-ended questions (i) as rated by three raters (r). There are three different designs in the study. Two of them are random effects designs, pxixr and pxi:r, and the last one is pxixr design using a fixed rater . According to the findings from the study, SPSS and EduG give the same results for variance component estimates as well as for G (Generalizability) and D (Decision) studies of all designs, as expected. Besides comparing the program outputs, their weaknesses and strengths were also discussed regarding different designs and data sets in this study.Keywords: Generalizability Theory * G Study * D Study * SPSS * EduGG theory has formed a comprehensive structure by employing variance analysis which provides a broad conceptual framework for social sciences such as psychology and education (Brennan, 2000, 2001a; Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Shavelson & Webb, 1991). It is also a powerful statistical tool for situations where there are numerous measurements. The theory, as an extension of classical test theory and variance analysis, stands as a model where multiple sources of error can be handled (Brennan, 2001a; Shavelson & Webb, 1991).Generalizability (G) TheoryThe reliability of measurement results in education and psychology was previously examined using classical test theory (CTT) in general. It is assumed in CTT that the observed score is composed of the actual score with no separable score for error. The restriction of this assumption, especially in performance measurements where the probability of the existence of more than one source of error is high, reveals the importance of G theory in which more than one source of error is handled and can be predicted simultaneously (Brennan, 2000). Another advantage of G theory in using performance assessment is that while there is a restrictive parallel assumption in CTT, randomly parallel assumption is adopted in G theory (Brennan, 2011; Kretchmar, 2006). The main aim of G theory is to generalize the scores of a specific measurement tool from a specific group to the universe of generalization which consist of 1) the universe of admissible observations and generalizability studies (G studies), 2) the universe of G studies and decision studies (D studies). While G studies provide an estimate of the generalizability coefficient of variances from all facets and this coefficient includes the examinee's universe score, D studies enable one to examine the interactions among all applicable facets (tasks, raters, observations, etc.) and the subject of measurement for calculating the dependability coefficient (Brennan, 2000; Crocker & Algina, 1986; Hsu, 2012).G theory has four main advantages compared to CTT. 1) It provides simultaneous evaluation of test-retest reliability, internal consistency, inter-rater reliability, and convergent validity. 2) It enables estimates of both individual measurement facets and interaction effects. 3) When assessing an examinee's performance, it gives information about the quality of their absolute structural level of knowledge as well as ranking this information in order. 4) It allows researchers to optimize the reliability of an assessment within the cost constraints of time and money. …
Read full abstract