Abstract

Nowadays the amount of data is rapidly increasing. For example, in 2019, International Telecommunication Union ITU states that the number of Internet users has become about 4.1 billion (53.6% of the global population). The big amount of data exceeds our ability to analyze and extract useful information without the help of computer techniques. Data mining is a common technique used in Machine Learning (ML) to extract useful knowledge from big data. Classification algorithms are also widely used for achieving accurate prediction. The classification techniques compared here were K-Nearest Nearest Neighbor (K-NN), Radial Basis Function Support Vector Machine (RBF SVM), Linear SVM, Sigmoid SVM, Logistic Regression (LR), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), and Naive Bayes (NB). This study aims at comparing the accuracy of six classification techniques using the confusion matrix evaluation model. The UCI PIMA Indian Diabetes Dataset is considered and deployed on the Anaconda python platform. The results showed that the achieved accuracy by using K-NN is 0.7265, by RBF SVM is 0.612, by Linear SVM is 0.7721, by Sigmoid SVM is 0.6510, by LR is 0.7695, by LDA is 0.7734, by CART is 0.6952, and by NB 0.7551.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.