Cautionary Remarks on the Use of Clusterwise Regression

Michael J Brusco,J Dennis Cradit,Douglas Steinley,Gavin L Fox

doi:10.1080/00273170701836653

Abstract

Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cautionary Remarks on the Use of Clusterwise Regression

Abstract

Talk to us

Similar Papers

More From: Multivariate Behavioral Research

Lead the way for us

Journal: Multivariate Behavioral Research	Publication Date: Mar 19, 2008
Citations: 62

Similar Papers

CORD: A program to compute the directional correlation coefficient d
John E Ettlie
Behavior Research Methods & Instrumentation | VOL. 7
John E EttlieJohn E Ettlie
01 Nov 1975
Behavior Research Methods & Instrumentation | VOL. 7

QUICLSTR: A FORTRAN program for hierarchical cluster analysis with large numbers of subjects
Philip A Bell ... John L Korey
Behavior Research Methods & Instrumentation | VOL. 7
Philip A Bell, et. al.Philip A Bell ... John L Korey
01 Nov 1975
Behavior Research Methods & Instrumentation | VOL. 7

Map Generation in High-Value Horticultural Integrated Pest Management: Appropriate Interpolation Methods for Site-Specific Pest Management of Colorado Potato Beetle (Coleoptera: Chrysomelidae)
Randall Weisz ... Zane Smilowitz
Journal of Economic Entomology | VOL. 88
Randall Weisz, et. al.Randall Weisz ... Zane Smilowitz
01 Dec 1995
Journal of Economic Entomology | VOL. 88

Distribution of the biased hypothesis sum of squares in linear models with missing observations
Anant M Kshirsagar ... Sheela Deo
Communications in Statistics - Theory and Methods | VOL. 18
Anant M Kshirsagar, et. al.Anant M Kshirsagar ... Sheela Deo
01 Jan 1989
Communications in Statistics - Theory and Methods | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cautionary Remarks on the Use of Clusterwise Regression

Abstract

Talk to us

Similar Papers

More From: Multivariate Behavioral Research