Abstract
Data mining techniques, extracting patterns from large databases have become widespread in all life’s aspect. One of the most important data mining tasks is classification. Classification is an important and widely studied topic in many disciplines, including statistics, artificial intelligent, operations research, computer science and data mining and knowledge discovery. One of the important things that should be done before using classification algorithms is preprocessing operations which cause to improve the accuracy of classification algorithms. Preprocessing operations include various methods that one of them is normalization. In this paper, we selected five applicable normalization methods and then we normalized selected data sets afterward we calculated the accuracy of classification algorithm before and after normalization. In this study the SVM algorithm was used in classification because this algorithm works based on n-dimension space and if the data sets become normalized the improvement of results will be expected. Eventually Data Envelopment Analysis (DEA) is used for ranking normalization methods. We have used four data sets in order to rank the normalization methods due to increase the accuracy then using DEA and AP-model outrank these methods.
Highlights
Data mining and knowledge discovery (DMKD) has made predominant progress during the past two decades (Peng et al, 2008)
Step 5: Ranking the Normalization methods with attention accuracies get from Support Vector Machine (SVM) classification algorithm by Data Envelopment Analysis (DEA) method
That the efficiency of all normalization methods is 1 it means all normalization methods are efficient and for ranking the normalization methods we should rank them by A.P model
Summary
Data mining and knowledge discovery (DMKD) has made predominant progress during the past two decades (Peng et al, 2008) It utilizes methods, algorithms, and techniques from many disciplines, including statistics, datasets, machine learning, pattern recognition, artificial intelligence, data visualization, and optimization (Fayyad, 1996). Classification is an important and widely studied topic in many disciplines, including statistics, artificial intelligent, operations research, computer science and data mining and knowledge discovery (Chen, Xu, & Chi, 1999). Yang and Liu (1995) followed two years later with experiments of their own on the same data set They used improved versions of Naive Bayes (NB) and k-nearest neighbors (KNN) but still found that the SVM performed at least as well as all other classifiers they tried.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.