Abstract

High-dimensional medical data makes prediction a complex and difficult task. This study aims at modeling predictive models for medical data. Two datasets of medical data are applied in the study — one online available dataset (Heart Disease data) and another real clinical dataset (Eye Infection Data). A wide range of machine learning algorithms are applied in the modeling stage: Decision Tree, Multilayer Perceptron, Naive Bayesian, Random Forest, and Support Vector Machine. Furthermore, bagging and voting ensemble methods have also been applied with base learners. Both split and cross-validation methods are adopted for the model validation, and well-established evaluation metrics such as accuracy, precision, recall, and F-measure have been considered as evaluation metrics for the predictive models. The method applied for the modeling is comprised of two stages. The first stage uses available features for the predictions. In the second stage, selected features based on positive correlation are used. The adopted method is also for deep learning, especially Convolutional Neural Network (CNN) is applied to analyze the outcomes compared to conventional machine learning algorithms. The experimental results reveal that better predictions are achieved in the second stage. Besides, experiments also indicate split percentage produces better predictive models, and marginally better outcomes are observed in the presence of ensemble methods in comparison with base models. NB outperformed other algorithms with the highest accuracy rate as 88.90%, and MLP obtained 97.50% accuracy for Heart Disease and Eye Infection data, respectively, using 80–20 splits in the second stage. However, the CNN model performed poorly due to the size of the considered datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call