A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Amjad Ali,Muhammad Sulaiman,Zardad Khan,Dost Muhammad Khan,Umair Khalil,Poom Kumam,Muhammad Hamraz

doi:10.1109/access.2020.3010099

Abstract

Ensemble methods based on k-NN models minimise the effect of outliers in a training dataset by searching groups of the k closest data points to estimate the response of an unseen observation. However, traditional k-NN based ensemble methods use the arithmetic mean of the training points' responses for estimation which has several weaknesses. Traditional k-NN based models are also adversely affected by the presence of non-informative features in the data. This paper suggests a novel ensemble procedure consisting of a class of base k-NN models each constructed on a bootstrap sample drawn from the training dataset with a random subset of features. In the k nearest neighbours determined by each k-NN model, stepwise regression is fitted to predict the test point. The final estimate of the target observation is then obtained by averaging the estimates from all the models in the ensemble. The proposed method is compared with some other state-of-the-art procedures on 16 benchmark datasets in terms of coefficient of determination (R 2 ), Pearson's product-moment correlation coefficient (r), mean square predicted error (MSPE), root mean squared error (RMSE) and mean absolute error (MAE) as performance metrics. Furthermore, boxplots of the results are also constructed. The suggested ensemble procedure has outperformed the other procedures on almost all the datasets. The efficacy of the method has also been verified by assessing the proposed method in comparison with the other methods by adding non-informative features to the datasets considered. The results reveal that the proposed method is more robust to the issue of non-informative features in the data as compared to the rest of the methods.

Highlights

Supervised learning is a machine learning task dealing with functions that map an input to an output based on samples in pairs of input and outputs. k-nearest neighbours (k-NN) algorithm is considered as one of the top ten supervised learning methods used for classification and regression [1]–[3]
The proposed method is compared with k-NN, random k-NN, random forest and support vector machine on 16 datasets using R2, Pearson’s product moment correlation coefficient r, predicted mean square error root mean squared error (RMSE), root mean square error RMSE and mean absolute error MAE as performance metrics
K-NN and random k-NN have consistently performed poor as compared to the proposed optimal k-NN ensemble (Ok-NN-E) method due to their sensitivity to non-informative features in the data

Summary

Introduction

Supervised learning is a machine learning task dealing with functions that map an input to an output based on samples in pairs of input and outputs. k-nearest neighbours (k-NN) algorithm is considered as one of the top ten supervised learning methods used for classification and regression [1]–[3]. K-nearest neighbours (k-NN) algorithm is considered as one of the top ten supervised learning methods used for classification and regression [1]–[3] It uses a set of k-nearest observations to decide on the response value of a test case trying to minimize the effect of outliers in a training dataset. This algorithm is fast, simple and easy to. Randomization techniques usually involves taking random samples from the training data and/or the given feature set for building the base k-NN models. This increases diversity in the base models reducing their chances

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Predictive modeling of blood pressure during hemodialysis: a comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method
Jiun-Chi Huang ... Chao-Hung Kuo
Computer Methods and Programs in Biomedicine | VOL. 195
Jiun-Chi Huang, et. al.Jiun-Chi Huang ... Chao-Hung Kuo
22 May 2020
Computer Methods and Programs in Biomedicine | VOL. 195

Anomalies Prediction in Radon Time Series for Earthquake Likelihood Using Machine Learning-Based Ensemble Model
Adil Aslam Mir ... Shahzad Ahmad Qureshi
IEEE Access | VOL. 10
Adil Aslam Mir, et. al.Adil Aslam Mir ... Shahzad Ahmad Qureshi
01 Jan 2021
IEEE Access | VOL. 10

Ensemble Learning For Television Program Rating Prediction
Iqbal Hanif ... Regita Fachri Septiani
Indonesian Journal of Statistics and Its Applications | VOL. 5
Iqbal Hanif, et. al.Iqbal Hanif ... Regita Fachri Septiani
30 Jun 2021
Indonesian Journal of Statistics and Its Applications | VOL. 5

Enhanced Artificial Neural Network with Harris Hawks Optimization for Predicting Scour Depth Downstream of Ski-Jump Spillway
Saad Sh Sammen ... Mohammad Ali Ghorbani
Applied Sciences | VOL. 10
Saad Sh Sammen, et. al.Saad Sh Sammen ... Mohammad Ali Ghorbani
27 Jul 2020
Applied Sciences | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A k-Nearest Neighbours Based Ensemble via Optimal Model Selection for Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access