Abstract
This study addresses the challenge of accurately identifying diabetes mellitus in individuals. Utilizing accessible online and real-world diagnostic data, we employ machine learning models, including Support Vector Machine, Random Forest, Naïve Bayes, eXtreme Gradient Boosting, and Deep Neural Network, on the PIMA Indian Diabetes and NHANES 1999-2016 datasets. Rigorous data pre-processing steps were conducted, handling null values, outliers, and imbalanced data together with data normalization. Our results reveal that the RF model achieves a 79% accuracy for binary classification on the PIMA Indian Diabetes dataset, using a 60:40 train-test split with BORUTA selected features. Meanwhile, the XGBoost model excels on the NHANES 1999-2016 dataset, achieving 92% accuracy for binary and 91% for multiclass classification respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Engineering Technology and Applied Physics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.