Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data

A’Inur A’Fifah,Abdullah Ahmad,Amelia Ritahani

doi:10.14569/ijacsa.2018.090236

A’Inur A’Fifah, Abdullah Ahmad + Show 1 more

Open Access

https://doi.org/10.14569/ijacsa.2018.090236

Copy DOI

Abstract

Imbalanced data is one of the challenges in a classification task in machine learning. Data disparity produces a biased output of a model regardless how recent the technology is. However, deep learning algorithms, such as deep belief networks showed promising results in many domains, especially in image processing. Therefore, in this paper, we will review the effect of imbalanced data disparity in classes using deep belief networks as the benchmark model and compare it with conventional machine learning algorithms, such as backpropagation neural networks, decision trees, naive Bayes and support vector machine with MNIST handwritten dataset. The experiment shows that although the algorithm is stable and suitable for multiple domains, the imbalanced data distribution still manages to affect the outcome of the conventional machine learning algorithms.

Highlights

Imbalanced class in a dataset occurs when the dataset is not in the same amount of values among the parameters or classes
In this paper, we will review the effect of imbalanced data disparity in classes using deep belief networks as the benchmark model and compare it with conventional machine learning algorithms, such as backpropagation neural networks, decision trees, naïve Bayes and support vector machine with MNIST handwritten dataset
The experiment shows that the algorithm is stable and suitable for multiple domains, the imbalanced data distribution still manages to affect the outcome of the conventional machine learning algorithms

Summary

Introduction

Imbalanced class in a dataset occurs when the dataset is not in the same amount of values among the parameters or classes. The majority class of the dataset is when the class has the most instances. The minority class of the dataset is when the class has the least instances. A few disadvantages prompted by imbalanced class data in a classification are over fitting, deficient class model and wrongly classified. Over fitting is a result of accuracy bias due to overwhelming data values in one class compared to missing values of another class. The model might give a high accurate result, but it is biased to the majority class

Objectives

Methods

Results

Conclusion