Abstract

Recently, there has been increasing interest in the use of classification and regression tree (CART) analysis. A tree-based regression model can be constructed by recursively partitioning the data with such criteria as to yield the maximum reduction in the variability of the response. Unfortunately, the exhaustive search may yield a bias in variable selection, and it tends to choose a categorical variable as a splitter that has many distinct values. In this study, an unbiased tree-based regression generalized unbiased interaction detection and estimation (GUIDE) model is introduced for its robustness against the variable selection bias. Not only are the underlying theoretical differences behind CART and GUIDE in variable selection presented, but also the outcomes of the two different tree-based regression models are compared and analyzed by utilizing intersection inventory and crash data. The results underscore GUIDE's strength in selecting variables equally. A simulation shed additional light on the resulting negative impact when an algorithm was inappropriately applied to the data. This paper concludes by addressing the strengths and weaknesses of—and, more important, the differences between—the two hierarchical tree-based regression models, CART and GUIDE, and advises on the appropriate application. It is anticipated that the GUIDE model will provide a new perspective for users of tree-based models and will offer an advantage over existing methods. Users in transportation should choose the appropriate method and utilize it to their advantage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call