LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees

Kin-Yee Chan,Wei-Yin Loh

doi:10.1198/106186004x13064

Abstract

Logistic regression is a powerful technique for fitting models to data with a binary response variable, but the models are difficult to interpret if collinearity, nonlinearity, or interactions are present. Besides, it is hard to judge model adequacy because there are few diagnostics for choosing variable transformations and no true goodness-of-fit test. To overcome these problems, this article proposes to fit a piecewise (multiple or simple) linear logistic regression model by recursively partitioning the data and fitting a different logistic regression in each partition. This allows nonlinear features of the data to be modeled without requiring variable transformations. The binary tree that results from the partitioning process is pruned to minimize a cross-validation estimate of the predicted deviance. This obviates the need for a formal goodness-of-fit test. The resulting model is especially easy to interpret if a simple linear logistic regression is fitted to each partition, because the tree structure and the set of graphs of the fitted functions in the partitions comprise a complete visual description of the model. Trend-adjusted chi-square tests are used to control bias in variable selection at the intermediate nodes. This protects the integrity of inferences drawn from the tree structure. The method is compared with standard stepwise logistic regression on 30 real datasets, with several containing tens to hundreds of thousands of observations. Averaged across the datasets, the results show that the method reduces predicted mean deviance by 9% to 16%.We use an example from the Dutch insurance industry to demonstrate how the method can identify and produce an intelligible profile of prospective customers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Journal: Journal of Computational and Graphical Statistics	Publication Date: Dec 1, 2004
Citations: 138

Similar Papers

An accurate soft diagnosis method of breast cancer using the operative fusion of derived features and classification approaches
Sunil Kumar Jha ... Jinwei Wang
Expert Systems | VOL. 39
Sunil Kumar Jha, et. al.Sunil Kumar Jha ... Jinwei Wang
11 Mar 2022
Expert Systems | VOL. 39

Why generation Y prefers online shopping: a study of young customers of India
Pradip Swarnakar ... Ajay Kumar
International Journal of Business Forecasting and Marketing Intelligence | VOL. 2
Pradip Swarnakar, et. al.Pradip Swarnakar ... Ajay Kumar
01 Jan 2015
International Journal of Business Forecasting and Marketing Intelligence | VOL. 2

In-stent restenosis in acute coronary syndrome-a classic and a machine learning approach.
Alexandru Scafa-Udriște ... Andrei Puiu
Frontiers in Cardiovascular Medicine | VOL. 10
Alexandru Scafa-Udriște, et. al.Alexandru Scafa-Udriște ... Andrei Puiu
22 Dec 2023
Frontiers in Cardiovascular Medicine | VOL. 10

Modification of the Minimum Logit Chi‐Squared Estimator in Simple Linear Logistic Regression
B Hosmane
Biometrical Journal | VOL. 30
B HosmaneB Hosmane
01 Jan 1987
Biometrical Journal | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics