Abstract

Although I support the contention that concurrent use of tree analysis and regression analysis may offer advantages to the researcher, I would like to raise a few questions about the November 1970 JMR article by Armstrong and Andress, Analysis of Marketing Data: Trees vs. Regression. When conducted with understanding, regression can have great utility which, I believe, is not fairly represented in that comparison. First, the authors claim that dummy coding in regression analysis, which allows numerical analysis of attribute information, is very (p. 487). Dummy variables need not be any more cumbersome than any other variable. I find dummy coding indispensable. It is the basis for the analysis of variance, a technique many researchers rely upon. Second, the authors do not sufficiently stress the superior ability of regression analysis to measure relationships rather than cell means (p. 487). A complete (as contrasted to exploratory) investigation by regression could be used directly for cost analysis or for predicting the effect of changing the characteristics of stations. An equation of the tree, if constructed, could, of course, be similarly used. In measuring these relationships, questions of interpretation arise. For instance, what are the standard errors of the estimates shown in the tree of Figure 3? Standard errors are routinely obtained in regression (see Table 2). Further, the rating variable, a five-category polytomy, is represented by four values in the tree analysis but has only one coefficient in the regression model. Does the regression coefficient represent the difference in response between one category and the other four, or has this subjective description been numerically scaled? Why is the typeof-area missing from the tree while its inclusion in the regression model seems important and is statistically significant? Third, the guiding rule (p. 487) for a minimum sample size of 10 observations per variable may be adequate for exploratory regression, but where the model has been established and precise estimates of the coefficients are wanted, the proper sample size may differ in either direction. This depends on: (1) residual error variance, (2) degree of intercorrelation in the data set, and (3) the specified degree of precision. Fourth, the restrictiveness of assumptions in regression analysis is seriously misrepresented (p. 488). In considering this question, it is important to distinguish between properties of models and properties of data. The properties discussed are model properties but are referred to as data problems. The first two problems, interaction and nonlinear effects, are problems only when one is not sure about their existence. One is then limited in using associated techniques (e.g., establishing confidence intervals) which depend on the assumption that the correct model is being used. Exploratory investigations are conducted to check for such interactions, curvilinearities, or discontinuities which need to be structured into the model. Holding out a validation subsample helps to avoid over fitting with the use of such structures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.