Abstract

Despite the unreasonable feature independence assumption, the naive Bayes classifier provides a simple way but competes well with more sophisticated classifiers under zero-one loss function for assigning an observation to a class given the features observed. However, it has been proved that the naive Bayes works poorly in estimation and in classification for some cases when the features are correlated. To extend, researchers had developed many approaches to free of this primary but rarely satisfied assumption in the real world for the naive Bayes. In this paper, we propose a new classifier which is also free of the independence assumption by evaluating the dependence of features through pair copulas constructed via a graphical model called D-Vine tree. This tree structure helps to decompose the multivariate dependence into many bivariate dependencies and thus makes it possible to easily and efficiently evaluate the dependence of features even for data with high dimension and large sample size. We further extend the proposed method for features with discrete-valued entries. Experimental studies show that the proposed method performs well for both continuous and discrete cases.

Highlights

  • An The naive Bayes, as one of the most popular learning algorithms for machine learning and data mining, has been widely used in many areas for classifying new instances given a vector of features

  • To extend the naive Bayes to account for the dependence of features, Friedman et al (1997) developed a tree-like augmented naive Bayes (TAN) in which the class node directly points to all features nodes and a feature can have only one parent from another feature in addition to the class node

  • We propose a new tree-based classifier as an extension of the naive Bayes by evaluating the dependence of features through pair copulas constructed via a D-Vine tree structure

Read more

Summary

Introduction

An The naive Bayes, as one of the most popular learning algorithms for machine learning and data mining, has been widely used in many areas for classifying new instances given a vector of features. We propose a new tree-based classifier as an extension of the naive Bayes by evaluating the dependence of features through pair copulas constructed via a D-Vine tree structure. A technique called pair-copula construction (Aas et al, 2006) was developed to model multivariate dependence using a cascade of bivariate copulas, acting only on two variables at a time. The margins in the proposed model could be considered acting the exactly same way as in the naive Bayes, while pair copulas are used for evaluating the dependence of features (variables).

Background
Learning Process
Extension to Discrete Cases
Experimental Studies
Studies for Discrete Cases
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call