Abstract
K-nearest neighbor (KNN) algorithm is a simple and widely used classification method in machine learning. This algorithm tries to search every object in the dataset to find the nearest several neighbors in order to classify and predict the target. Therefore, the runtime of KNN algorithm will become longer when encountering big datasets. There are several articles discussing the improved KNN algorithm based on the KD-Tree storage structure. However, the time complexity of KD-Tree grows rapidly with the increase of the dimensionality of datasets. Therefore, a new improved KNN algorithm is proposed in this paper. This new KNN algorithm is based on PCA analysis and KD-Tree data structure. By combining these two techniques, the efficiency of the classification process could be increased significantly. All features of PCA and KD-Tree related to the new proposed algorithm are discussed and the specific steps of the new method are stated. The new KNN algorithm is applied in two experiments through python. The result shows that the efficiency of the new KNN algorithm is improved greatly under certain situations and its accuracy also has a good performance. However, several potential drawbacks could be seen from the result of the experiments. To further improve the new proposed KNN algorithm, more advanced techniques and experiments are needed to be developed in the future.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.