Abstract

The world is surrounded by a huge amount of data increasing day by day. The increase of data leads to the presence of high-dimensional data causing a challenge in data mining as it becomes costly in computation time and memory space. Moreover, high-dimensional data can affect the classification accuracy of machine learning algorithms because of the existence of redundant and irrelevant features, and that can cause the Curse of Dimensionality problem. Therefore, Dimensionality Reduction has been introduced to solve the Curse of Dimensionality problem by feature extraction techniques. Since there is a lack in the performance of some machine learning models because of high-dimensional data, this paper introduces two common dimensionality reduction methods, and conducts an empirical comparison between , Principal Component Analysis (PCA) and Auto-Encoders (AE) to study their effect in improving the performance of classification in high-dimensional data. The study uses three classification models, K-Nearest Neighbour(KNN),Support Vector Machine (SVM), and Random Forest (RF) to perform the classification in the MNIST, and Fashion-MNIST datasets. The results have been compared , analyzed, and show that AE has a better effect on improving performance of KNN, and SVM classifiers on MNIST dataset, as the SVM accuracy is improved from 94% to 97% when the AE used to reduce dimension with a percentage of 95%,90%, and 50%.Moreover, KNN accuracy has been improved by AE from 91% to 95% , when dimension is reduced with a percentage of 95%,90%, and 50%.On the other hand, the accuracy of the same classifier has been improved in Fashion-MNIST dataset from 81% to 83%, when dimension is reduced by AE with a percentage of 90% and 50%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call