Abstract
Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.
Highlights
Due to the evolving of computer science and the fast development and vast usage of World Wide Web and other electronic data, information extraction is a popular research field
Selected Classification Algorithms There are numerous classification algorithms, but we have focused on algorithms of diverse nature, three different algorithms have been chosen
C4.5 is the famous algorithm that is based on the decision tree algorithm, whereas the Naïve Bayes is a probabilistic algorithm and the Support Vector Machine algorithm (SVM) is a kernel based algorithm
Summary
Due to the evolving of computer science and the fast development and vast usage of World Wide Web and other electronic data, information extraction is a popular research field. Data mining [1, 2] is a significant method to extract information from data. Classification [3, 4] is one of the main domains of data mining and has extensively been used for various purposes like decision making, weather forecasting, prediction of customers’ attitude, prediction of various social risk analysis as well as official tasks, prediction of influential bloggers [5,6,7,8,9,10] etc. The first phase generates the classification model known as classifiers that depict the relationship between characteristics and classes. Most classifiers use probability calculations to make class labels, accuracy measure has not been a target. Naive Bayes and the C4.5 learning algorithm are alike in predictive accuracy [11,12,13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Engineering, Technology & Applied Science Research
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.