An Empirical Analysis of Imbalanced Data Classification

Shu Zhang,Malek Mouhoub,Samira Sadaoui

doi:10.5539/cis.v8n1p151

Abstract

SVM has been given top consideration for addressing the challenging problem of data imbalance learning. Here,we conduct an empirical classification analysis of new UCI datasets that have dierent imbalance ratios, sizes andcomplexities. The experimentation consists of comparing the classification results of SVM with two other popularclassifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons. To make the comparative exper-iments more comprehensive and have a better idea about the learning performance of each classifier, we employin total four performance metrics: Sensitive, Specificity, G-means and time-based eciency. For each benchmarkdataset, we perform an empirical search of the learning model through numerous training of the three classifiersunder dierent parameter settings and performance measurements. This paper exposes the most significant resultsi.e. the highest performance achieved by each classifier for each dataset. In summary, SVM outperforms the othertwo classifiers in terms of Sensitive (or Specificity) for all the datasets, and is more accurate in terms of G-meanswhen classifying large datasets.

Highlights

Data classification is a significant research topic in the areas of data mining and machine learning
The experimentation consists of comparing the classification results of Support Vector Machine (SVM) with two other popular classifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons
A well-known classifier is the Support Vector Machine (SVM), which was initially introduced by Vapnik (Vapnik, 1998)

Summary

Introduction

Data classification is a significant research topic in the areas of data mining and machine learning. Learning from training data that are imbalanced is diffcult since the standard machine learning systems often misclassify minority instances as majority ones (Koknar-Tezel Latecki, 2009). This means that the prediction of classifying a new data into the minority class is very low (Haibo Garcia, 2009).

Support Vector Machine

Performance Measurements

Related Works

An Analysis Approach of Imbalanced Data Classification

Data Selection

Data Preprocessing

Measurement selection

Classification with Naive Bayes and J48

Classifier Comparison

Empirical Analysis and Comparison

Fertility

User Knowledge Modeling

Vertebral Column

Rebalanced Seismic Bumps

Rebalanced Bank Marketing

Findings

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer and Information Science	Publication Date: Jan 29, 2015
Citations: 14	License type: cc-by

R Discovery Prime

R Discovery Prime

An Empirical Analysis of Imbalanced Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science

Lead the way for us

Similar Papers

Consolidated performance measurement framework for government e-procurement focusing on internal stakeholders
Suvil Chomchaiya ... Vatcharaporn Esichaikul
Information Technology & People | VOL. 29
Suvil Chomchaiya, et. al.Suvil Chomchaiya ... Vatcharaporn Esichaikul
06 Jun 2016
Information Technology & People | VOL. 29

PERBANDINGAN OPTIMALISASI HASIL KLASIFIKASI MENGGUNAKAN PSO PADA ALGORITMA C.45 DAN CART (STUDIKASUS PREDIKSI PENYAKIT STROKE)
Muhammad Guschoyin ... Handoyo Widi Nugroho
Jurnal Informatika | VOL. 24
Muhammad Guschoyin, et. al.Muhammad Guschoyin ... Handoyo Widi Nugroho
25 Jun 2024
Jurnal Informatika | VOL. 24

Performance metrics analysis for aircraft maintenance process control
Kevin M Taaffe ... Lindsey Grigg
Journal of Quality in Maintenance Engineering | VOL. 20
Kevin M Taaffe, et. al.Kevin M Taaffe ... Lindsey Grigg
06 May 2014
Journal of Quality in Maintenance Engineering | VOL. 20

Performance measurement in humanitarian relief chains
Benita M Beamon ... Burcu Balcik
International Journal of Public Sector Management | VOL. 21
Benita M Beamon, et. al.Benita M Beamon ... Burcu Balcik
25 Jan 2008
International Journal of Public Sector Management | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Analysis of Imbalanced Data Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science