Abstract
Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.