Abstract

Classification can be regarded as dividing the data space into decision regions separated by decision boundaries. In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective. Thus, a decision tree can be regarded as a classifier tree, in which each classifier on a non-root node is trained in decision regions of the classifier on the parent node. Meanwhile, the NBTree algorithm, which generates a classifier tree with the C4.5 algorithm and the naive Bayes classifier as the root and leaf classifiers respectively, can also be regarded as training naive Bayes classifiers in decision regions of the C4.5 algorithm. We propose a second division (SD) algorithm and three soft second division (SD-soft) algorithms to train classifiers in decision regions of the naive Bayes classifier. These four novel algorithms all generate two-level classifier trees with the naive Bayes classifier as root classifiers. The SD and three SD-soft algorithms can make good use of both the information contained in instances near decision boundaries, and those that may be ignored by the naive Bayes classifier. Finally, we conduct experiments on 30 data sets from the UC Irvine (UCI) repository. Experiment results show that the SD algorithm can obtain better generalization abilities than the NBTree and the averaged one-dependence estimators (AODE) algorithms when using the C4.5 algorithm and support vector machine (SVM) as leaf classifiers. Further experiments indicate that our three SD-soft algorithms can achieve better generalization abilities than the SD algorithm when argument values are selected appropriately.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.