Abstract

Data mining techniques, extracting patterns from large databases have become widespread in all life’s aspect. One of the most important data mining tasks is classification. Classification is an important and widely studied topic in many disciplines, including statistics, artificial intelligent, operations research, computer science and data mining and knowledge discovery. One of the important things that should be done before using classification algorithms is preprocessing operations which cause to improve the accuracy of classification algorithms. Preprocessing operations include various methods that one of them is normalization. In this paper, we selected five applicable normalization methods and then we normalized selected data sets afterward we calculated the accuracy of classification algorithm before and after normalization. In this study the SVM algorithm was used in classification because this algorithm works based on n-dimension space and if the data sets become normalized the improvement of results will be expected. Eventually Data Envelopment Analysis (DEA) is used for ranking normalization methods. We have used four data sets in order to rank the normalization methods due to increase the accuracy then using DEA and AP-model outrank these methods.

Highlights

  • Data mining and knowledge discovery (DMKD) has made predominant progress during the past two decades (Peng et al, 2008)

  • Step 5: Ranking the Normalization methods with attention accuracies get from Support Vector Machine (SVM) classification algorithm by Data Envelopment Analysis (DEA) method

  • That the efficiency of all normalization methods is 1 it means all normalization methods are efficient and for ranking the normalization methods we should rank them by A.P model

Read more

Summary

Introduction

Data mining and knowledge discovery (DMKD) has made predominant progress during the past two decades (Peng et al, 2008) It utilizes methods, algorithms, and techniques from many disciplines, including statistics, datasets, machine learning, pattern recognition, artificial intelligence, data visualization, and optimization (Fayyad, 1996). Classification is an important and widely studied topic in many disciplines, including statistics, artificial intelligent, operations research, computer science and data mining and knowledge discovery (Chen, Xu, & Chi, 1999). Yang and Liu (1995) followed two years later with experiments of their own on the same data set They used improved versions of Naive Bayes (NB) and k-nearest neighbors (KNN) but still found that the SVM performed at least as well as all other classifiers they tried.

Normalization
Performance Measures
Data Sources
Experimental Design
Findings
Discussion of Results
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.