Abstract

Accuracy of a classifier is important for the success of any prediction model. The more accuracy a classifier possesses, the more robust the system is made on it. In this paper, a disease prediction model is developed in Python for the classification of diabetes in patients. In the research paper, study is performed to make a comparative analysis of the performance of machine learning classification algorithms. The classifier's performances are enhanced by of tuning the hyperparameters of classifiers and applied different dataset preprocessing methods. In this experimental analysis, four models have been created, and each model is based on a dataset, obtained by different preprocessing methods of PIMA dataset. For each model, K-Nearest Neighbors, Decision Tree, Random Forest, and Support vector machines classification algorithms, have been applied and classifier's hyperparameters are tuned to get better results from these models.A detail analysis has also performed to get the best prediction model, the best classifier and effective preprocessing methods for it. The prediction model use F1score as the main metric. The highest F1score and accuracy are 75.68 % and 88.61% respectively, which is achieved by Random Forest classifier for dataset model D3 obtained by removing the samples having missing or unknown values from PIMA dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.