Abstract

This paper introduces an experimental study of the heart disease datasets characteristics impact on the performance of classification algorithms in the aim of identifying the best algorithm for each dataset under its characteristics. The performance of five machine learning algorithms (logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), Random Forest (RF), and support vector machine (SVM)), single layer neural network (ANN), and deep neural network (DNN), has been evaluated using five heart disease datasets under four data complexity measurement: number of samples (dataset size), number of features (dimension of dataset), Data sparsity measures, and correlation of features. All datasets have been processed and normalized then the mutual information-based feature selection method was used to solve the overfitting problem. The results show that in general, the machine learning especially the Random Forest algorithm achieves high classification accuracy than deep learning network. In other hand, the high sparsity and less mutual information of dataset has large impact on degradation of the performance of classification algorithms than other characteristics of data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call