A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance

Mahinda Mailagaha Kumbure,Pasi Luukka

doi:10.1007/s41066-021-00288-w

Mahinda Mailagaha Kumbure, Pasi Luukka

Open Access

https://doi.org/10.1007/s41066-021-00288-w

Copy DOI

Journal: Granular Computing	Publication Date: Sep 25, 2021
Citations: 18	License type: open-access

Affiliation: Lappeenranta-Lahti University of Technology

Abstract

The fuzzy k-nearest neighbor (FKNN) algorithm, one of the most well-known and effective supervised learning techniques, has often been used in data classification problems but rarely in regression settings. This paper introduces a new, more general fuzzy k-nearest neighbor regression model. Generalization is based on the usage of the Minkowski distance instead of the usual Euclidean distance. The Euclidean distance is often not the optimal choice for practical problems, and better results can be obtained by generalizing this. Using the Minkowski distance allows the proposed method to obtain more reasonable nearest neighbors to the target sample. Another key advantage of this method is that the nearest neighbors are weighted by fuzzy weights based on their similarity to the target sample, leading to the most accurate prediction through a weighted average. The performance of the proposed method is tested with eight real-world datasets from different fields and benchmarked to the k-nearest neighbor and three other state-of-the-art regression methods. The Manhattan distance- and Euclidean distance-based FKNNreg methods are also implemented, and the results are compared. The empirical results show that the proposed Minkowski distance-based fuzzy regression (Md-FKNNreg) method outperforms the benchmarks and can be a good algorithm for regression problems. In particular, the Md-FKNNreg model gave the significantly lowest overall average root mean square error (0.0769) of all other regression methods used. As a special case of the Minkowski distance, the Manhattan distance yielded the optimal conditions for Md-FKNNreg and achieved the best performance for most of the datasets.

Highlights

In machine learning, a regression problem refers to estimating a real-valued continuous response based on the values of one or more input variables
The Md-fuzzy k-nearest neighbor regression (FKNNreg) method achieved significantly improved performance compared with the Euc-FKNNreg method even though the results of the two methods were comparable in some cases, for example, in the cases of Servo and Stock
This subsection presents the regression results of the MdFKNNreg and baseline models with the testing data samples that were initially split from the original datasets

Summary

Introduction

A regression problem refers to estimating a real-valued continuous response (output) based on the values of one or more input variables. By determining the relationships between output and input variables, a regression method numerically predicts a target value. Various regression techniques have been introduced for a wide range of machine learning problems. K-nearest neighbor regression (KNNreg) (Benedetti 1977; Stone 1977; Turner 1977) has become. To improve model and alleviate such issues, Keller et al (1985) introduced the idea of using the degree of membership in the KNN method to propose its fuzzy version, called the fuzzy k-nearest neighbor (FKNN) classifier. Thanks to its capability of tackling uncertainty issues in the data, the FKNN model

Objectives

Results

Conclusion