Abstract

Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

Highlights

  • Due to the evolving of computer science and the fast development and vast usage of World Wide Web and other electronic data, information extraction is a popular research field

  • Selected Classification Algorithms There are numerous classification algorithms, but we have focused on algorithms of diverse nature, three different algorithms have been chosen

  • C4.5 is the famous algorithm that is based on the decision tree algorithm, whereas the Naïve Bayes is a probabilistic algorithm and the Support Vector Machine algorithm (SVM) is a kernel based algorithm

Read more

Summary

Introduction

Due to the evolving of computer science and the fast development and vast usage of World Wide Web and other electronic data, information extraction is a popular research field. Data mining [1, 2] is a significant method to extract information from data. Classification [3, 4] is one of the main domains of data mining and has extensively been used for various purposes like decision making, weather forecasting, prediction of customers’ attitude, prediction of various social risk analysis as well as official tasks, prediction of influential bloggers [5,6,7,8,9,10] etc. The first phase generates the classification model known as classifiers that depict the relationship between characteristics and classes. Most classifiers use probability calculations to make class labels, accuracy measure has not been a target. Naive Bayes and the C4.5 learning algorithm are alike in predictive accuracy [11,12,13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call