Abstract

Data mining (DM) is an efficient tool used to mine hidden information from databases enriched with historical data. The mined information provides useful knowledge for decision makers to make suitable decisions. Based on the applications, the knowledge required by the decision makers will differ and thus need different mining techniques. Hence, an ample set of mining techniques like classification, clustering, association mining, regression analysis, outlier analysis, etc. are used in practice for knowledge discovery. These mining techniques utilize various Machine Learning (ML) algorithms. ML algorithms assume the normal objects as highly probable and the outliers as low probable. The global outliers which occur very rarely will deviate totally from the normal objects and can be easily distinguished by unsupervised ML algorithms. Whereas, the collective outliers which occur rarely as groups will deviate from the normal objects and can be distinguished by ML algorithms. This paper analyzes the outliers and class imbalance for diabetes prediction for different ML algorithms, i.e. logistic regression (LR), decision tree (DT), random forest (RF), K-neighbors (K-NN), and XG-Boosting (XGB).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.