An Efficient Reference-Point Based k Neighbors Algorithm for Imbalanced Data

Junkuan Wang,Zizhong Chen,Yanmin Wu,Jinli Qi

doi:10.1109/icccbda55098.2022.9778895

Abstract

The kNNs are very simple, effective and online mul-ticategory learning algorithms, which makes them widely applica-ble to various fields. However, almost all kNNs have the problem of low accuracy in imbalanced data. In this paper, an interesting point is observed that the nearest neighbors might not always be the best samples for prediction, and a reference-point based k Neighbors algorithm (RPkN) is proposed. The proposed algorithm uses some reference-points to measure the distance between two data points rather than directly computing the distance. This makes it able to find better neighbors than their nearest ones, and the problem of low accuracy in imbalanced data sets could be solved. Therefore, the proposed method can achieve higher aver-age accuracy than the existing exact kNNs. In addition, it avoids the direct computation of distances between each pair of data points without using any tree structure, and this makes the RPkN decrease the time complexity to O(nlogn). The time complexity to classify a new data point is O(logn). The time complexity is lower than almost all the existing exact kNNs. The low time complexity and not relying on tree structures makes it very suitable to be paralleled for large-scale data processing when compared with other fast exact kNNs using tree structure.

Full Text