Abstract

As the high-dimensional data impact the performance of the k-Nearest Neighbor (k-NN) classification algorithm, dimension reduction for the big data classification using k-NN algorithm draws huge attention from the industry as well as academia. The popular dimension reduction techniques such as PCA, LDA, SVD are data-dependent methods where the projection matrix was generated from the input data which make the reusability of the projection matrix impractical. In this paper, a Data-Independent Reusable Projection (DIRP) technique has been proposed to project high-dimensional data to low-dimensional data and prove how the projection matrix can be reused for any dataset with same number of dimensions. The proposed DIRP method preserves the distance between any two points in the dataset which works well for the distance-based classification algorithms like k-NN. The DIRP method has been implemented in R, and a new package “RandPro” for generating projection matrix has been developed and tested with the CIFAR-10, handwritten digit recognition (MNIST) dataset and human activity recognition dataset. The two versions of the RandPro package have been uploaded in the CRAN repository. The running time and classification metrics comparison between PCA and DIRP method has been analyzed. From the results, it has been found that the running time of the proposed method is reduced significantly with the near equivalent accuracy to the original data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call