Abstract

Diabetes is a serious metabolic disorder and many people suffer from it. The main causes of this disease are obesity, age, lifestyle, malnutrition, blood pressure, etc. People with diabetes are at high risk for diseases of the heart, kidneys, eyes and other organs. Therefore, early diagnosis of diabetes is important to prevent these diseases. Machine learning and big data analytics play an important role in the healthcare industry. Machine learning techniques are used in prediction of the disease and in improving the performance. The paper focuses on ML classification techniques in PIDD (Pima Indian Diabetes Dataset) sourced from UCI ML repository to predict the presence of diabetes in patients with utmost correctness using Python. In this we have proposed a diabetes prediction model for better classification of diabetes using factors like BMI, Glucose, Age etc. Five ML techniques (KNN, XGBOOST, Logistic Regression, Gradient Boosting Classifier and Random Forest Classifier) were used in the experiment to detect diabetes at an early stage and the performance of these algorithms is validated using measures such as Error Rate, Accuracy, Precision, Recall and FMeasure. XGBOOST provided the best result among all the ML algorithms used, showing the maximum accuracy of 82%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call