Abstract

Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/prediction tasks in data mining. Empirical studies on these performance indicators in data mining are few. Therefore, this study was designed to determine how data mining classification algorithm perform with increase in input data sizes. Three data mining classification algorithms—Decision Tree, Multi-Layer Perceptron (MLP) Neural Network and Naive Bayes— were subjected to varying simulated data sizes. The time taken by the algorithms for trainings and accuracies of their classifications were analyzed for the different data sizes. Results show that Naive Bayes takes least time to train data but with least accuracy as compared to MLP and Decision Tree algorithms.

Highlights

  • A large volume of data is poured into our computer networks, the World Wide Web (WWW), and various data storage devices every day from business, society, science and engineering, medicine, and almost every other aspect of daily life

  • (a) and Figure 1(b), it could be inferred that as data sizes were increasing, Naïve Bayes classification algorithm’s time complexity was the least, followed by J48 (Decision Tree) and Artificial Neural Networks (ANNs) (Multi-Layer Perceptron Neural Network) in that order. This means that MLP takes highest times for each of the data instances than the J48 Decision Tree and Naïve Bayes Classifiers

  • Results from this study show that there is a trade-off between accuracy and time complexities of the three algorithms (Multi-layer Perceptron, Naïve Bayes and Decision Tree) used

Read more

Summary

Introduction

A large volume of data is poured into our computer networks, the World Wide Web (WWW), and various data storage devices every day from business, society, science and engineering, medicine, and almost every other aspect of daily life. This explosive growth of available data volume emanates as a result of the computerization of our society and the fast development of powerful data collection and storage tools [1]. J. Oyabugbe tions) from very large databases or data warehouses [2]. Data mining consists of more than collection and managing data; it includes analysis and prediction

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call