Abstract

The emergence of machine learning in medicine has revolutionized the entire procedure of detecting and treating ailments. For knowledge extraction and decision support, machine learning models have been adapted in healthcare research. The significance of data preprocessing is often overlooked in mainstream health informatics research, which focuses more on generating accurate models. This paper focuses on building robust classification models for the prediction of diabetes, heart, and liver disease using a variety of preprocessing techniques to achieve optimal results. The implementation of the models is carried out on datasets sourced from the University of California, Irvine (UCI) Machine Learning Repository. Numerous preprocessing techniques such as feature engineering, data pruning, oversampling for skewed datasets, imputation of missing values, encoding categorical variables, and feature scaling are used in this paper. These techniques help considerably augment the performance of the classification algorithms used, which include Random Forest, K-Nearest Neighbours (KNN), and Support Vector Machine (SVM) among others. The performance of these algorithms is further improved by hyperparameter tuning, significantly improving the accuracy scores. The maximum accuracies obtained for heart disease, liver disease and diabetes prediction are 90.16%, 73% and 93.23% respectively. The paper also showcases the advantages of detection of these diseases at an early stage, which could make a substantial difference in numerous cases. The performance of different classifiers has been documented using metrics such as Accuracy, Balanced Accuracy, and F-1 score. Further visualization and comparison of the performance of the classification algorithms are carried out to find the best results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.