Reliability: What type, please!

Weimo Zhu

doi:10.1016/j.jshs.2012.11.001

Abstract

Validity and reliability, as we all learned in our first research methods class, are two of the most important qualities of any test, measurement or assessment. When compared with validity, reliability is actually more important since without it, there would be no validity. Since reliability is so important, almost all research journals today have some articles related to reliability. Unfortunately, many of these articles fail to report one of the important pieces of information regarding reliability e its type. In addition, if the type of reliability is reported, it is often not supported by its study design. To fully understand why reporting the type of reliability and the related study design is important, a short review on the definition of reliability, its types, and their relationship with errors may be helpful. Reliability is popularly defined as “the consistency of measurements when the testing procedure is repeated”. Assume that a test taker did a test once and there is no change in the ability or underlying trait being measured; then suppose that the same test was administered again to that same test taker. One would expect the scores from these two trials should be quite similar. If not, the test would be unreliable. According to classical testing theory if we administer one test many times to a test taker, this person’s test scores, known as the observed scores, will not be the same all the time. If we plot the scores in a frequency distribution, then assuming that there is no learning or fatigue effect, this distribution should look like a normal distribution with most scores close to the center (mean) of the score distribution, with a few very large or very small scores (Fig. 1). The mean in this case represents the level of ability or intrinsic traits of the test taker, which is known as the “true score”, “universe score”, or “ability/trait” depending on the testing theory employed. The distance between an observed score and the true score is often called “error”, which could represent natural variations in the ability being measured or may be caused by some sort of systematic error. Thus, any observed score can be conceptually considered to have two parts: a true score plus an error. When the error is zero, the observed score (X1 in Fig. 1) will be equal to the true score. A true score is unknown in real life, but it can be estimated by determining the measurement error and subtracting it from the obtained score. The observed score X2 has a slightly larger error on the positive side of the true score, whereas the observed score X3 has a much larger error, but on the negative side. The relationship among the observed score, true score, and error can therefore be summarized as: observed score (X) 1⁄4 true score (T) þ error (E). Many factors can contribute to the error or variability: a test taker may try harder, be more anxious, be in better health, or simply make a lucky guess. Since most of these variations function randomly and do not apply to every test taker, they are called “random errors”. In contrast, other variations could be caused by a systematic error: a mechanical problem with the scale when it was used in a retest, or instead of collecting physical activity data on weekdays as was done during the first data collection, the retest data were collected during the weekend. This kind of error is called a “systematic error” because it will apply to every individual test taker. With careful design, the magnitude of the systematic error can be detected (e.g., collecting physical activity on both weekdays and weekends). In contrast, the magnitude of random errors cannot be detected because they are random, inconsistent, and unpredictable in nature. In most reliability studies, only a simple testeretest design is employed; therefore, both errors are confounded. There may be a variation observed between the test and retest scores, but you will NOT be able to determine the proportion or contribution of the random versus systematic errors. To do so, a more complex design is needed. E-mail address: weimozhu@illinois.edu Peer review under responsibility of Shanghai University of Sport.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Sport and Health Science	Publication Date: Nov 21, 2012
Citations: 10	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Reliability: What type, please!

Abstract

Talk to us

Similar Papers

More From: Journal of Sport and Health Science

Lead the way for us

Similar Papers

Random Measurement Error
Gideon J Mellenbergh
-
Gideon J MellenberghGideon J Mellenbergh
01 Jan 2019
01 Jan 2019

Investigating the Impact of Random and Systematic Errors on GPS Precise Point Positioning Ambiguity Resolution
Joong-Hee Han ... Zhizhao Liu
Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography | VOL. 32
Joong-Hee Han, et. al.Joong-Hee Han ... Zhizhao Liu
30 Jun 2014
Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography | VOL. 32

Classical Test Theory
Amanda Gorham ... Jennifer Randall
-
Amanda Gorham, et. al.Amanda Gorham ... Jennifer Randall
30 May 2022
30 May 2022

Comparison of two measurement devices for obtaining horizontal force-velocity profile variables during sprint running
Erin Feser ... Kyle Lindley
International Journal of Sports Science & Coaching | VOL. 17
Erin Feser, et. al.Erin Feser ... Kyle Lindley
13 Jan 2022
International Journal of Sports Science & Coaching | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reliability: What type, please!

Abstract

Talk to us

Similar Papers

More From: Journal of Sport and Health Science