Statistical methods in epidemiology. VI. Correlation and regression: the same or different?

Alan S Rigby

doi:10.1080/09638280050207857

Abstract

Purpose : The statistical terms 'correlation' and 'regression' are frequently mistaken for each other in the scientific literature. Why this is so is unclear. This paper discusses their differences/similarities arguing that in most circumstances regression is the most appropriate technique to use, since regression incorporates a notion of dependency of one variable on another. Method : Pearson's correlation coefficient (r) is introduced as a method for estimating the degree of linear association between two normally distributed variables. The problem of 'least squares' regression (when y depends on x) is introduced by considering the best-fitting straight line between points on a scatter plot. Results : Correlation, regression analysis and residual estimation are discussed by taking examples from the author's own teaching experiences. Conclusions : Correlation and regression share some similarities. However, regression is the better technique to use because with it comes a notion of dependency of one variable upon another. Regression model checking includes residual examination. The importance of plotting and examination of residuals cannot be overemphasized. Residual examination should become as much a part of a regression analysis as the estimation of the regression coefficients themselves.

Full Text