Abstract

K-nearest neighbor (KNN) algorithm is a simple and widely used classification method in machine learning. This algorithm tries to search every object in the dataset to find the nearest several neighbors in order to classify and predict the target. Therefore, the runtime of KNN algorithm will become longer when encountering big datasets. There are several articles discussing the improved KNN algorithm based on the KD-Tree storage structure. However, the time complexity of KD-Tree grows rapidly with the increase of the dimensionality of datasets. Therefore, a new improved KNN algorithm is proposed in this paper. This new KNN algorithm is based on PCA analysis and KD-Tree data structure. By combining these two techniques, the efficiency of the classification process could be increased significantly. All features of PCA and KD-Tree related to the new proposed algorithm are discussed and the specific steps of the new method are stated. The new KNN algorithm is applied in two experiments through python. The result shows that the efficiency of the new KNN algorithm is improved greatly under certain situations and its accuracy also has a good performance. However, several potential drawbacks could be seen from the result of the experiments. To further improve the new proposed KNN algorithm, more advanced techniques and experiments are needed to be developed in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call