Abstract

Dear Editor-in-Chief Pietro E. di Prampero, My colleagues and I recently read the study by Sanada et al. (2007) on the development of prediction models for maximal oxygen uptake ð _ VO2maxÞ. We would like to focus our comments on two particular areas of the study: (1) the use of stepwise regression, and (2) the practical application of the prediction equations. 1. Use of stepwise regression: Stepwise regression allows a computer program to select a small set of the ‘best’ predictors from a larger set of potential predictors (Tabachnick and Fidell 2001). Stepwise procedures should not be used to develop prediction models because this method produces an inflated R-squared (R), inaccurate test of statistical significance, and it does not maximize the theoretical or practical value of the model (Berger 2004; Keppel and Wickens 2004). An essential problem is that estimates of population multiple correlations and tests of statistical significance fail to take into account how many variables were considered in the stepwise analysis. Inflation occurs whether the experimenter selects predictors after looking at the correlations or stepwise regression is used to select the ‘best’ predictors out of a larger set of potential predictors (Cohen et al. 2003). A more realistic estimate of the population multiple correlation is ‘shrunken’ R based on the total number of variables considered. In the Sanada et al. (2007) study where the two strongest predictors from a set of 15 potential predictors produced R of 0.72 with a sample of N = 40, an estimate of the population multiple R based on 15 predictors is the shrunken R of 0.55. Contrary to the conclusions of Sanada et al. (2007) based on their inflated R, their model offers no improvement on models generated in larger studies as shown in their Table 5. Ordinarily a regression formula generated on one sample will produce a smaller R when it is applied to a new sample (Pedhazur 1997). Thus, it is surprising that Sanada et al. (2007) found R to be larger (R = 0.83) in the validation group than in the derivation group (R = 0.72) for which the model was generated. Perhaps this can be explained by large sampling error due to the extremely small sample size (N = 20) for the validation group. In practice, it is always preferable for the investigator to control the order of entry of predictor variables based on theoretical considerations (Berger 2004). This procedure is called ‘‘hierarchical analysis,’’ and it requires the investigator to plan the analysis with care, prior to looking at the data. The double advantage of hierarchical methods over stepwise methods is that there is less capitalization on chance, and careful choice of the order of entry of predictors assures that results such as R added are maximally interpretable (Berger 2004). Kerlinger (1986) stated that, ‘‘... the research problem and the theory behind the problem should determine the order of entry of variables in multiple regression analysis.’’ (p. 545). For example, Malek et al. (2004b, 2005) used hierarchical analysis to develop nonexercise-based M. H. Malek (&) Human Performance Laboratory, Department of Nutrition and Health Sciences, University of Nebraska-Lincoln, 110 Ruth Leverton Hall, Lincoln, NE 68583-0806, USA e-mail: mmalek@unlserve.unl.edu

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call