Abstract

Abstract Using medical data mining models has been considered as a significant way to predict diseases in recent years. In the field of healthcare, we face a large amount of data, and this is one of the challenges in predicting and analyzing the target disease. With the help of data mining models, one can convert this data into valuable information, and through analyzing them logically and scientifically, one can reach accurate decision-making and actual prediction. Another challenge in the field of disease prediction is selecting features that are more significant than other features. Feature subset selection is performed to improve the performance of models with the highest accuracy. The purpose of this study is to select significant features by comparing data mining models to predict liver disease based on an extraction, loading, transformation, analysis (ELTA) approach for correct diagnosis. Hence, the data mining models are compared based on the ELTA approach, such as random forest, Multi-Layer Perceptron (MLP) neural network, Bayesian networks, Support Vector Machine (SVM), and Particle Swarm Optimization (PSO)-SVM. Among these models, the PSO-SVM model has the best performance regarding the criteria of specificity, sensitivity, accuracy, Area under the Curve (AUC), F-measure, precision, and False Positive Rate (FPR). Furthermore, a 10-fold cross-validation method for evaluation of models is used so that the models were evaluated on a liver disease dataset. The average of estimated accuracy was calculated as 87.35%, 78.91%, 66.78%, 76.51% and 95.17% for Random forest, MLP Neural network, Bayesian network, SVM and PSO-SVM models, respectively. Regarding the mentioned evaluation criteria, we obtained the highest performance of accuracy with the least number of features through the hybrid PSO-SVM-based optimized model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call