Abstract

This article focuses on the quality of data mining algorithms in terms of the accuracy ratio and time consumption. So, in order to figure out the best algorithm among the classification and clustering algorithms, the WEKA program will be testing all algorithms using a real dataset from the size effect on defect proneness for open source software. The Mozilla product is adopted as an example of open source software. The dataset that is used in this paper represents the output of the study of the size effect on defect proneness in the open source software. The study of Mozilla product shows a significant relationship between the size of software and the number of defect proneness in software. The Mozilla product study produced a dataset to be as inputs of the WEKA program in order to compare the data mining tools (algorithms). We use the Naive Bayes, Decision Trees J48, Expectation-maximization for classifying and K-Star and Simple KMeans for clustering methods. The findings demonstrate the difference between the algorithms according to the accuracy, and the time consuming to reach the result in each algorithm. Furthermore, the effect of the software size is significant on defect proneness. Finally, the experiments are conducted in WEKA with the aim of this research is finding out the best algorithm in terms of accuracy and time-consuming. At the end, the paper will be figuring out the best algorithm in order to choose and depending on it in the tests of classification and clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call