Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data

Jesus Maillo,Francisco Herrera,Isaac Triguero,Julian Luengo,Salvador Garcia

doi:10.1109/tfuzz.2019.2936356

Jesus Maillo, Francisco Herrera + Show 3 more

Open Access

https://doi.org/10.1109/tfuzz.2019.2936356

Copy DOI

Abstract

One of the best-known and most effective methods in supervised classification is the k-nearest neighbors algorithm (kNN). Several approaches have been proposed to improve its accuracy, where fuzzy approaches prove to be among the most successful, highlighting the classical fuzzy k-nearest neighbors (FkNN). However, these traditional algorithms fail to tackle the large amounts of data that are available today. There are multiple alternatives to enable kNN classification in big datasets, spotlighting the approximate version of kNN called hybrid spill tree. Nevertheless, the existing proposals of FkNN for big data problems are not fully scalable, because a high computational load is required to obtain the same behavior as the original FkNN algorithm. This article proposes global approximate hybrid spill tree FkNN and local hybrid spill tree FkNN, two approximate approaches that speed up runtime without losing quality in the classification process. The experimentation compares various FkNN approaches for big data with datasets of up to 11 million instances. The results show an improvement in runtime and accuracy over literature algorithms.

Highlights

The Fuzzy k Nearest Neighbor algorithm (FkNN) [1] is developed with the aim of improving and alleviating the main weakness of the k Nearest Neighbor algorithm [2]
In [22], we investigate the feasibility of an exact approach to apply FkNN in big data called Global Exact Fuzzy k Nearest Neighbors (GE-FkNN) [22]
If we focus on the runtime of the classification stage, it is shared by GAHS-FkNN and LHS-FkNN as both models follow the same scheme on this stage

Summary

Introduction

The Fuzzy k Nearest Neighbor algorithm (FkNN) [1] is developed with the aim of improving and alleviating the main weakness of the k Nearest Neighbor algorithm (kNN) [2]. This weakness resides in considering all neighbors as important in the classification, making the kNN algorithm more vulnerable to noise at the class boundaries, leading to a downgrading of the classification. FkNN is composed of two stages: class membership degree and classification. The second stage calculates the kNN with the information of the membership degree. It is possible to detect borders with greater precision, being less affected by noise and improving the kNN in most classification problems used in many applications such as medicine [4], spacecraft [5], and many other fields

Objectives

Results

Conclusion