Abstract

Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.

Highlights

  • As a big nation in Asia, Indonesia keeps developing various fields to keep up with the world‟s more onset of diseases

  • M5 algorithm is the proposed feature selection algorithm, which aims at improving the chi-square algorithm, using the techniques of clustering, to increase the accuracy of predicting Diabetes, using a specific number of attributes [21,22,23,24,25]

  • The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, An efficient feature selection algorithm for health care data analysis (Mythily R.)

Read more

Summary

Introduction

As a big nation in Asia, Indonesia keeps developing various fields to keep up with the world‟s more onset of diseases. Healthcare scenarios have massive data sets and contain values that are not always needed to get the result we desire, to select the features that are required, we use feature selection to get the values that will affect the output of the prediction most and work with those to provide the maximum accuracy possible. Data mining is an approach toward dealing with extensive data sets to make out designs and make up connections to think about issues in the course of data examination. Membership rules are through by investigating data for a visit if/at that point designs, at that position utilizing the help and sureness criterion to discover the most critical connections inside the data. Support is the way by which as often as possible the things confirm up in the database, while certainty is the circumstances if/at that point articulations are exact. A classification technique searches for new examples and may carry about a modification in the way the data is composed

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.