Abstract

In education research, discussion about reliability and validity is common and often inevitable. In such discussions, we often encounter statements such as ‘‘the test is reliable’’, and/or ‘‘the test is valid.’’ These and similar statements appear innocuous, and the meaning(s) of such statements are often assumed to be self-explanatory. However, such statements may often represent conscious or unconscious misconceptions about measurement reliability and validity, and this brief paper takes a critical look at why such use of language about reliability and validity should be avoided in education research practice. In education research, reliability and validity are important and fundamental concepts, and almost all education research involves some form of assessment or measurement. Some researchers may describe a test or instrument used in a study using statements such as ‘‘the test is reliable’’ and/or ‘‘the test is valid.’’ These and similar descriptions are common. Such descriptions and statements, unfortunately, may incorrectly assume and incorrectly convey to the audience that reliability and validity are inherent characteristics of a test; as a result, these test characteristics would be true and applicable in other research situations. It has been pointed out by many that reliability and validity are actually the characteristics of measurement–– that is, the characteristics of the scores obtained from administering a test to a specific group, rather than the characteristics of a test itself. Worthen et al. (1999) discussed that reliability did not refer to the instrument itself. ‘‘Technically, reliability refers to the consistency of the results obtained, not to the instrument itself’’ (p. 95). Crocker and Algina (1986) discussed that ‘‘a test is not ‘reliable’ or ‘unreliable’. Rather, reliability is a property of the scores on a test for a particular group of examinees.’’ (p. 144). Similarly, Gronlund and Linn (1990) stated that ‘‘reliability refers to the results obtained with an evaluation instrument and not to the instrument itself... it is more appropriate to speak of the reliability of the ‘‘test scores’’ or of the ‘‘measurement’’ than of the ‘‘test’’ or the ‘‘instrument.’’ (p. 78). For ‘‘validity’’, the situation is similar. The well-known and widely followed ‘‘Standards for educational and psychological testing’’ states ‘‘validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.’’ (AERA, APA, & NCME 1999). In simpler terms, validity refers to the appropriate use of test scores in a given measurement situation but not to the test or instrument itself. Why is it an issue when we make a statement such as ‘‘the test is reliable’’ or ‘‘the test is valid’’? First, this is sloppy use of language. Although it may appear innocuous to many, such sloppy use of language in the contexts of education research and measurement is likely to lead to misconceptions, to conscious or unconscious assumptions, and to incorrect research practice. As Thompson (2003) argued, sloppy speaking may lead to sloppy thinking, to sloppy practice, and to incorrect assumptions about tests and measurement instruments. The ‘‘incorrect assumptions that tests themselves are reliable can lead to insufficient attention to the impacts of measurement integrity on the integrity of substantive research conclusions’’ (p. 94). Second, if a researcher is fully aware that reliability (validity) is about test scores or measurement data in his/her study, but not about test itself, it becomes necessary that reliability needs to be assessed for the measurement data in a study, even when a well-known and well-established instrument (e.g., Wechsler Adult Intelligence Scale, X. Fan (&) Faculty of Education, University of Macau, Macao, China e-mail: xtfan@umac.mo

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call