A fast neighborhood classifier based on hash bucket with application to medical diagnosis

Jiayu Xiao,Qinghua Zhang,Zhihua Ai,Guoyin Wang

doi:10.1016/j.ijar.2022.05.012

Abstract

In the medical diagnosis, expensive costs will increase significantly with the increment of medical information, and they can be reduced by data mining methods. The neighborhood classifier, as one of the extensions of the neighborhood rough set, has become an intuitive and effective classification method in data mining. However, there are still some defects which limit its performance. On the one hand, most existing neighborhood classifiers suffer from high computation complexity to obtain the neighborhood of samples; on the other hand, the difference between samples in the neighborhood is ignored when classifying samples, which diminishes the classification ability of the model. Therefore, in this paper, hash buckets and distance voting rule are introduced to solve the above problems, and a fast neighborhood classifier based on hash bucket (FNC-HB) is proposed. First, all samples are mapped to corresponding buckets through the hash function. Next, for any test sample, a bucket-based adaptive neighborhood classification radius is defined, in which the artificial parameter has been eliminated. Then, to avoid the indiscernibility of traditional voting rule when predicting labels of the test sample, a new rule called distance voting rule is defined. Finally, experimental results show that FNC-HB has better classification performance and computation efficiency, which is feasible and effective.

Full Text