Abstract

A number of articles published in the June issue of World Journal of Surgery displays data with very large standard deviations (SDs), even larger than the mean at times. Sharma et al. [1] reports that the preoperative parathyroid hormone (PTH) values for their patients of two groups—preoperative dual-energy X-ray absorptiometry (DEXA) and by both preand postoperative DEXA scan, were 166 ± 171 and 173 ± 165 pg/ml, respectively. Parikh et al. [2] states that the follow-up periods for patients with biochemically mild and conventional primary hyperparathyroidism were 33.0 ± 38.9 and 173 ± 165 months respectively. Is it possible to have such a large SD that is, at times, larger than mean? Yes, this is possible. SD is a measure of the variability in a given data set. The SD value stands for the average distance of a set of values from the mean value [3, 4]. The SD may be larger than the mean if the data contain negative values or the data set has extreme outlier values. Negative values usually are not found in biomedical data sets. Can the patients have a negative follow-up period, and can the serum PTH value be negative? Certainly not. For a follow-up period of 2, 4, 6, 8, and 500 months, the mean would be 104, and the SD would be 221 months. This is clearly evident because the fifth patient has an unexpectedly very long follow-up period of 500 months. Whenever we find this situation, we must screen our data set for any outlier. Looking at the minimum or maximum value would clarify this. In other words, to avoid these errors, data cleaning is an indispensable part of data analysis. A large SD in a cleaned data set should be better presented as the median (interquartile range) because it is a nonparametric data set, in contrast to a parametric data set, which is represented as mean ± SD [5]. Some authors use standard error of the mean (SEM) to conceal a large SD, trying to imply incorrectly that their observations are more accurate, because SEM is always smaller than SD (SEM = SD/square root of the sample size). It must be remembered that SEM is used to derive the confidence interval, which gives a range of values around the sample mean for determining the ‘‘true’’ (population) mean (with a given level of certainty). Therefore, the SEM gives us an idea concerning the accuracy of the mean and does not indicate the dispersion of data in a given sample. Statistics is indispensable for biomedical research. Researchers need not be experts of statistics, but education in basic statistics would enable them to understand the statistical requirements for their research. This would help them communicate better with their statistician, and later, with readers through their well-written report.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call