A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets

Jin Wei Zhang,Wu Tao Chen,Hui Juan Lu,Yi Lu

doi:10.4028/www.scientific.net/amr.271-273.1291

Abstract

The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets

Abstract

Talk to us

Similar Papers

More From: Advanced Materials Research

Lead the way for us

Journal: Advanced Materials Research	Publication Date: Jul 1, 2011
Citations: 4

Similar Papers

Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
Donghui Shi ... Jian Guan
-
Donghui Shi, et. al. Donghui Shi ... Jian Guan
01 Jul 2015
01 Jul 2015

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

Multi-view cost-sensitive kernel learning for imbalanced classification problem
Jingjing Tang ... Yingjie Tian
Neurocomputing | VOL. 552
Jingjing Tang, et. al.Jingjing Tang ... Yingjie Tian
14 Jul 2023
Neurocomputing | VOL. 552

CUSBoost: Cluster-Based Under-Sampling with Boosting for Imbalanced Classification
Farshid Rayhan ... Asif Mahbub
-
Farshid Rayhan, et. al.Farshid Rayhan ... Asif Mahbub
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets

Abstract

Talk to us

Similar Papers

More From: Advanced Materials Research