A Large-Scale  k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Yunsheng Song,Chao Zhang,Xiaohan Kong

doi:10.1155/2022/7409171

A Large-Scale k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Yunsheng Song, Chao Zhang + Show 1 more

Open Access

https://doi.org/10.1155/2022/7409171

Copy DOI

Abstract

Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k -nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k -nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k -means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.

Highlights

K-nearest neighbor classification algorithm is a lazy learning method that does not require a training process but stores training instances [1]
We have proposed a novel algorithm to explore the effect of data partition on the classification performance of k-nearest neighbor (kNN) classification algorithm, which could largely keep k same nearest neighbors as the original algorithm
Different from previous improved kNN classification algorithms based on data partition, the proposed algorithm theoretically studies the effect of data partition from the perspective of optimization, and it proves that the similarity of instances within the different partitioned subsets to be smaller is the key factor for the generation ability of the classifier

Summary

Introduction

K-nearest neighbor classification algorithm is a lazy learning method that does not require a training process but stores training instances [1]. The feature space of the training set is divided into several subregions, determines which divided subregions the test instance belongs to, and finds k-nearest neighbors in the subset of instances corresponding to that region These algorithms mainly take advantage of the local learning characteristics of kNN classification algorithms: the label of the test instance in the prediction process is only related to the most similar instances in the training set. The time complexity of most existing instance selection algorithms is the square of the training set size, which makes it difficult to effectively process large-scale data It only uses the information of the part data rather than all the data, so its generalization performance could be negatively affected.

Related Work

Main Content

Experiments

Findings

Summary

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wireless Communications and Mobile Computing	Publication Date: Jan 7, 2022
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Large-Scale k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Similar Papers

An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer Diagnosis
Ali Şenol ... Mahmut Kaya
Türk Doğa ve Fen Dergisi | VOL. 13
Ali Şenol, et. al.Ali Şenol ... Mahmut Kaya
26 Mar 2024
Türk Doğa ve Fen Dergisi | VOL. 13

Improved Algorithms for Document Classification &Query-based Multi-Document Summarization
Suzanne D’Silva ... Seema Shrawne
IACSIT international journal of engineering and technology | VOL. 3
Suzanne D’Silva, et. al.Suzanne D’Silva ... Seema Shrawne
01 Jan 2010
IACSIT international journal of engineering and technology | VOL. 3

Clustering and Curve Fitting by Line Segments
Hrishikesh D Vinod ... Fred Viole
SSRN Electronic Journal | VOL. -
Hrishikesh D Vinod, et. al.Hrishikesh D Vinod ... Fred Viole
31 Oct 2016
SSRN Electronic Journal | VOL. -

Classifying and clustering in negative databases
Ran Liu ... Wenjian Luo
Frontiers of Computer Science | VOL. 7
Ran Liu, et. al.Ran Liu ... Wenjian Luo
25 Sep 2013
Frontiers of Computer Science | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Large-Scale k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing