A bias-variance based heuristic for constructing a hybrid logistic regression-naïve Bayes model for classification

Yi Tan,Prakash P Shenoy

doi:10.1016/j.ijar.2019.09.007

Yi Tan, Prakash P Shenoy

Open Access

https://doi.org/10.1016/j.ijar.2019.09.007

Copy DOI

Journal: International Journal of Approximate Reasoning	Publication Date: Oct 1, 2019
Citations: 10	License type: publisher-specific-oa

Affiliation: University of Kansas

Abstract

Discriminative classifiers tend to have lower asymptotic classification errors, while generative classifiers can be more accurate when the training set size is small. In this paper, we examine the construction of hybrid models from categorical data, where we use logistic regression (LR) as a discriminative component, and naïve Bayes (NB) as a generative component. We adopt a bias-variance tradeoff based strategy, with the objective of minimizing the sum of these two errors. Specifically, the proposed heuristic consists of functions of training sample size and conditional dependence among features. These functions serve as proxies for model variance and model bias. We implement our method on 25 different classification datasets, and find that the hybrid model does better than pure LR and pure NB. Our proposed method is competitive with random forest. Although the hybrid model fails to beat LASSO in predictive performance, as suggested by the experimental results, the difference appears to be insignificant when the number of features is small. Also, the hybrid model requires less training time than LASSO, which makes it more attractive when the training time is a big concern.

Full Text