Dealing With Non-normal Data

Kristin L Sainani

doi:10.1016/j.pmrj.2012.10.013

Abstract

Although some continuous variables follow a normal, or bell-shaped, distribution, many do not. Non-normal distributions may lack symmetry, may have extreme values, or may have a flatter or steeper “dome” than a typical bell. There is nothing inherently wrong with non-normal data; some traits simply do not follow a bell curve. For example, data about coffee and alcohol consumption are rarely bell shaped. Instead, these follow a right-skewed distribution: they have a cluster of values at zero (nonconsumers), another bunch in the low-to-moderate range, and a few extreme values to the right (heavy consumers). Researchers need to be aware of whether their variables follow normal or non-normal distributions, because this influences how data are described and analyzed. Non-normal variables, particularly those with extreme right or left tails, may be better summarized (described) with medians and percentiles rather than means and standard deviations. Standard statistical tests for analyzing continuous data (t-test, analysis of variance [ANOVA], linear regression) may also perform poorly on non-normal data but only if the sample size is small. In these cases, alternative statistical approaches may be warranted. This article reviews how to spot, describe, and analyze non-normal data, and clarifies when the “normality assumption” matters and when it is unimportant.

Full Text