Abstract

SVM has been given top consideration for addressing the challenging problem of data imbalance learning. Here,we conduct an empirical classification analysis of new UCI datasets that have dierent imbalance ratios, sizes andcomplexities. The experimentation consists of comparing the classification results of SVM with two other popularclassifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons. To make the comparative exper-iments more comprehensive and have a better idea about the learning performance of each classifier, we employin total four performance metrics: Sensitive, Specificity, G-means and time-based eciency. For each benchmarkdataset, we perform an empirical search of the learning model through numerous training of the three classifiersunder dierent parameter settings and performance measurements. This paper exposes the most significant resultsi.e. the highest performance achieved by each classifier for each dataset. In summary, SVM outperforms the othertwo classifiers in terms of Sensitive (or Specificity) for all the datasets, and is more accurate in terms of G-meanswhen classifying large datasets.

Highlights

  • Data classification is a significant research topic in the areas of data mining and machine learning

  • The experimentation consists of comparing the classification results of Support Vector Machine (SVM) with two other popular classifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons

  • A well-known classifier is the Support Vector Machine (SVM), which was initially introduced by Vapnik (Vapnik, 1998)

Read more

Summary

Introduction

Data classification is a significant research topic in the areas of data mining and machine learning. Learning from training data that are imbalanced is diffcult since the standard machine learning systems often misclassify minority instances as majority ones (Koknar-Tezel Latecki, 2009). This means that the prediction of classifying a new data into the minority class is very low (Haibo Garcia, 2009).

Support Vector Machine
Performance Measurements
Related Works
An Analysis Approach of Imbalanced Data Classification
Data Selection
Data Preprocessing
Measurement selection
Classification with Naive Bayes and J48
Classifier Comparison
Empirical Analysis and Comparison
Fertility
User Knowledge Modeling
Vertebral Column
Rebalanced Seismic Bumps
Rebalanced Bank Marketing
Findings
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.