The illness that happens in the human body because of enormous amounts of sugar in the blood, i.e., when the human body has elevated amounts of glucose in the blood is Diabetes Mellitus all the more ordinarily referred to just as diabetes. The diverse most usually happening assortments of diabetes are Prediabetes, Type2, Type 1 and Gestational Diabetes. Type 2 diabetes, is interminable and generally happens, when the human body does not usefully utilize the hormone, insulin, which is created by it. The Type 1 assortment happens when the pancreatic organ doesn't deliver enough insulin as is required by the human body.Prediabetes is one that occurs when the blood sugar levels are very high but not as much when compared to the Type 2 variety. Gestational diabetes usually affects pregnant women and here also the blood sugar levels are very high. According to the global report by the WHO (World Health Organization), around 422 million people suffer from the disease and a worrying 1.6 million odd deaths are credited only to diabetes every year. However, timely diagnosis of the disease and care of patients through simple lifestyle measures has proven to keep this deadly disease in check. The main challenge for doctors however, is the tedious process of identifying the factors that cause the occurrence of this disease, in an effective and timely manner. During the recent times this challenge is being addressed through Data Mining and Machine Learning techniques. The main aim of this experimentation is for designing a prediction model which can, with utmost accuracy, diagnose the occurrence of diabetes in patient. These training models have been designed using the WEKA tool and four supervised machine learning classification algorithms such as Naïve Bayes, J48, SVM and Neural Networks have been used to predict the onset of diabetes at an early stage. The dataset used here is the Pima Indian Diabetes training Dataset abbreviated as PIDD, which has been acquired from the UCI repository. Chi-squared tests have been applied on this dataset to obtain only those attributes that have the highest tendency of causing diabetes in patients. The performance of each of the classification algorithms have been compared and analyzed based on Accuracy, F-measure, Recall, Precision and ROC curves.
Read full abstract