Abstract

There are three foundations identified by professional standards for examining the psychometric quality of assessments: validity, reliability, and fairness. Thus, reliability is a primary concern for all assessments. Reliability is defined as the consistency of scores across replications. In education, the sources of measurement error and the basis for replications include items, forms, raters, or occasions. The source of the measurement error will determine the type of reliability and ultimately the generalizations about the measurement. Thus, inconsistency in scores is potentially due to multiple sources of random error, and this definition can be applied to multiple types of replications depending on the generalization that is to be made (e.g., items, forms, raters, or occasions). There are also multiple indices for reporting reliability, including reliability coefficients, generalizability coefficients, standard errors of measurement, and information functions, to name a few. The indices are defined differently with different test theories. For example, classical test theory emphasizes reliability coefficients and standard errors of measurement; item response theory emphasizes information functions; generalizability theory emphasizes generalizability coefficients, dependability indices, and relative and absolute standard errors; and classification consistency emphasizes proportion agreement unadjusted or adjusted for chance agreement. The importance of reliability varies depending on the uses made of the assessment. Reliability is considered to be increasingly important when the consequences of test use are more high stakes. Thus, reliability is expected to be more rigorously adhered to when tests are used to make high-stakes decisions about individuals, such as employment or certification decisions and decisions about clinical placement. While validity, or the interpretations and uses of test scores, is considered the most important characteristic of a test, reliability provides a strong foundation for validity, providing a necessary condition for most test uses or interpretations. When scores are not consistent within a testing procedure, the scores are considered to be influenced instead by random errors of measurement. Thus, the scores will not have strong relationships to other variables, will not have strong internal structure, and will not accurately reflect score uses and interpretations that are necessary for validity. Consequently, reliability is often considered necessary to the valid use and interpretations of scores. On the other hand, the test could have high reliability and still not be valid for a particular use or interpretation, since validity would be dependent on measuring consistently and measuring the right construct.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call