Abstract

The aim of this study is to select significant features that contribute for accuracy in classification. Data mining is a field where we find lots of data which can be useful or useless in any form available in Data Warehouse. Implementing classification on these huge, uneven, useless data sets with large number of features is just a waste of time degrading the efficiency of classification algorithms and hence the results are not much accurate. Hence we propose a system in which we first use PCA (Principal Component Analysis) for selection of the attributes on which we perform Classification using Bayes theorem, Multi-Layer Perceptron, Decision tree J48 which indeed has given us better result than that of performing Classification on the huge complete data sets with all the attributes. Also association rule mining using traditional Apriori algorithm is experimented to find out sub set of features related to class label. The experiments are conducted using WEKA 3.6.0 Tool.

Highlights

  • Data mining is a field were huge amount of data which is been mined form data warehouse

  • Classification is divided into two categories supervised and unsupervised, Supervised classification is the technique in which label is already known before Classification and in Unsupervised we need to find it based on the training sets and apply it on test data

  • This study proposes a method where classification technique is used only with the important attributes using feature selection techniques namely PCA (Principal Component Analysis) and Association rule mining technique which will select the subset attributes significant for classification

Read more

Summary

INTRODUCTION

Data mining is a field were huge amount of data which is been mined form data warehouse. This study proposes a method where classification technique is used only with the important attributes using feature selection techniques namely PCA (Principal Component Analysis) and Association rule mining technique which will select the subset attributes significant for classification. Multi-layer-perceptron: It is the classification algorithm based on neural network which takes a lot of time to execute but the result accuracy is efficient. In our proposal we chose J48 as a decision tree algorithm, Bayes as Bayesian type and Multi-layerPerceptron as Neural Network based classification algorithm because they are the best in their fields of classification techniques. The study (Phyu, 2009) concludes the comparisons of the classification algorithms based on accurate system results and depicts how decision tree and bayes classification technique is well suited for good accuracy. The time taken for training to model the independent variables to dependent variables is large (Rajeswari and Vaithiyanathan, 2012c)

PROPOSED METHODOLOGY
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.