Identify Best Learning Method for Heart Diseases Prediction Under impact of Different Datasets Characteristics

Zahraa Chaffat Oleiwi,Ebtesam N Alshemmary,Salam Al-Augby

doi:10.31642/jokmc/2018/100104

Zahraa Chaffat Oleiwi, Ebtesam N Alshemmary + Show 1 more

Open Access

https://doi.org/10.31642/jokmc/2018/100104

Copy DOI

Abstract

This paper introduces an experimental study of the heart disease datasets characteristics impact on the performance of classification algorithms in the aim of identifying the best algorithm for each dataset under its characteristics. The performance of five machine learning algorithms (logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), Random Forest (RF), and support vector machine (SVM)), single layer neural network (ANN), and deep neural network (DNN), has been evaluated using five heart disease datasets under four data complexity measurement: number of samples (dataset size), number of features (dimension of dataset), Data sparsity measures, and correlation of features. All datasets have been processed and normalized then the mutual information-based feature selection method was used to solve the overfitting problem. The results show that in general, the machine learning especially the Random Forest algorithm achieves high classification accuracy than deep learning network. In other hand, the high sparsity and less mutual information of dataset has large impact on degradation of the performance of classification algorithms than other characteristics of data.

Full Text