Abstract

Dimension reduction and feature selection are two important data preprocessing techniques in data mining applications. The effect of data preprocessing on classification algorithms such as Random Forest, Naive Bayes, Support Vector Machine, Convolutional Neural Network, Decision Tree, K-Nearest Neighbors, and Multi-layer Perceptron are presented. Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are used for dimension reduction. Gini Index and Entropy are used for feature selection. Experiments are done on three datasets. For Diabetic Retinopathy Debrecen Dataset, dimension reduction and feature selection do not have any significant influence on classification accuracy. For Online Shoppers Purchasing Intention Dataset, Entropy based feature selection has significant influence on classification accuracy. For SPECTF Heart Dataset, SVD dimension reduction has some influence on classification accuracy. Our experiment with PCA and SVD dimension reduction methods and Entropy based feature selection method produced higher classification accuracies than those produced by Backward Feature Elimination dimension reduction method reported earlier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call