Abstract

Validity and reliability, as we all learned in our first research methods class, are two of the most important qualities of any test, measurement or assessment. When compared with validity, reliability is actually more important since without it, there would be no validity. Since reliability is so important, almost all research journals today have some articles related to reliability. Unfortunately, many of these articles fail to report one of the important pieces of information regarding reliability e its type. In addition, if the type of reliability is reported, it is often not supported by its study design. To fully understand why reporting the type of reliability and the related study design is important, a short review on the definition of reliability, its types, and their relationship with errors may be helpful. Reliability is popularly defined as “the consistency of measurements when the testing procedure is repeated”. Assume that a test taker did a test once and there is no change in the ability or underlying trait being measured; then suppose that the same test was administered again to that same test taker. One would expect the scores from these two trials should be quite similar. If not, the test would be unreliable. According to classical testing theory if we administer one test many times to a test taker, this person’s test scores, known as the observed scores, will not be the same all the time. If we plot the scores in a frequency distribution, then assuming that there is no learning or fatigue effect, this distribution should look like a normal distribution with most scores close to the center (mean) of the score distribution, with a few very large or very small scores (Fig. 1). The mean in this case represents the level of ability or intrinsic traits of the test taker, which is known as the “true score”, “universe score”, or “ability/trait” depending on the testing theory employed. The distance between an observed score and the true score is often called “error”, which could represent natural variations in the ability being measured or may be caused by some sort of systematic error. Thus, any observed score can be conceptually considered to have two parts: a true score plus an error. When the error is zero, the observed score (X1 in Fig. 1) will be equal to the true score. A true score is unknown in real life, but it can be estimated by determining the measurement error and subtracting it from the obtained score. The observed score X2 has a slightly larger error on the positive side of the true score, whereas the observed score X3 has a much larger error, but on the negative side. The relationship among the observed score, true score, and error can therefore be summarized as: observed score (X) 1⁄4 true score (T) þ error (E). Many factors can contribute to the error or variability: a test taker may try harder, be more anxious, be in better health, or simply make a lucky guess. Since most of these variations function randomly and do not apply to every test taker, they are called “random errors”. In contrast, other variations could be caused by a systematic error: a mechanical problem with the scale when it was used in a retest, or instead of collecting physical activity data on weekdays as was done during the first data collection, the retest data were collected during the weekend. This kind of error is called a “systematic error” because it will apply to every individual test taker. With careful design, the magnitude of the systematic error can be detected (e.g., collecting physical activity on both weekdays and weekends). In contrast, the magnitude of random errors cannot be detected because they are random, inconsistent, and unpredictable in nature. In most reliability studies, only a simple testeretest design is employed; therefore, both errors are confounded. There may be a variation observed between the test and retest scores, but you will NOT be able to determine the proportion or contribution of the random versus systematic errors. To do so, a more complex design is needed. E-mail address: weimozhu@illinois.edu Peer review under responsibility of Shanghai University of Sport.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.