A COPULA-BASED SUPERVISED LEARNING CLASSIFICATION FOR CONTINUOUS AND DISCRETE DATA

Yuhui Chen

doi:10.6339/jds.201610_14(4).0010

Abstract

Despite the unreasonable feature independence assumption, the naive Bayes classifier provides a simple way but competes well with more sophisticated classifiers under zero-one loss function for assigning an observation to a class given the features observed. However, it has been proved that the naive Bayes works poorly in estimation and in classification for some cases when the features are correlated. To extend, researchers had developed many approaches to free of this primary but rarely satisfied assumption in the real world for the naive Bayes. In this paper, we propose a new classifier which is also free of the independence assumption by evaluating the dependence of features through pair copulas constructed via a graphical model called D-Vine tree. This tree structure helps to decompose the multivariate dependence into many bivariate dependencies and thus makes it possible to easily and efficiently evaluate the dependence of features even for data with high dimension and large sample size. We further extend the proposed method for features with discrete-valued entries. Experimental studies show that the proposed method performs well for both continuous and discrete cases.

Highlights

An The naive Bayes, as one of the most popular learning algorithms for machine learning and data mining, has been widely used in many areas for classifying new instances given a vector of features
To extend the naive Bayes to account for the dependence of features, Friedman et al (1997) developed a tree-like augmented naive Bayes (TAN) in which the class node directly points to all features nodes and a feature can have only one parent from another feature in addition to the class node
We propose a new tree-based classifier as an extension of the naive Bayes by evaluating the dependence of features through pair copulas constructed via a D-Vine tree structure

Summary

Introduction

An The naive Bayes, as one of the most popular learning algorithms for machine learning and data mining, has been widely used in many areas for classifying new instances given a vector of features. We propose a new tree-based classifier as an extension of the naive Bayes by evaluating the dependence of features through pair copulas constructed via a D-Vine tree structure. A technique called pair-copula construction (Aas et al, 2006) was developed to model multivariate dependence using a cascade of bivariate copulas, acting only on two variables at a time. The margins in the proposed model could be considered acting the exactly same way as in the naive Bayes, while pair copulas are used for evaluating the dependence of features (variables).

Background

Learning Process

Extension to Discrete Cases

Experimental Studies

Studies for Discrete Cases

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Science	Publication Date: Mar 7, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

A COPULA-BASED SUPERVISED LEARNING CLASSIFICATION FOR CONTINUOUS AND DISCRETE DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science

Lead the way for us

Similar Papers

A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients
Kimiya Gohari ... Ali Sheidaei
BMC Medical Research Methodology | VOL. 23
Kimiya Gohari, et. al.Kimiya Gohari ... Ali Sheidaei
21 Aug 2023
BMC Medical Research Methodology | VOL. 23

Texture classification of aerial image based on PCA-NBC
Xin Yu ... Mingsheng Liao
-
Xin Yu, et. al.Xin Yu ... Mingsheng Liao
10 Oct 2005
10 Oct 2005

Mixture of latent multinomial naive Bayes classifier
Nima Shiri Harzevili ... Sasan H Alizadeh
Applied Soft Computing Journal | VOL. 69
Nima Shiri Harzevili, et. al.Nima Shiri Harzevili ... Sasan H Alizadeh
26 Apr 2018
Applied Soft Computing Journal | VOL. 69

The Improvement of Naive Bayesian Classifier Based on the Strategy of Fuzzy Feature Selection with the Dual Space
Peng Liu ... Jinjin Fan
-
Peng Liu, et. al.Peng Liu ... Jinjin Fan
01 Sep 2007
01 Sep 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A COPULA-BASED SUPERVISED LEARNING CLASSIFICATION FOR CONTINUOUS AND DISCRETE DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science