Abstract

There are various machine-learning techniques available for classification and regression tasks. The k-nearest neighbours (k-NN) method is a well-recognized algorithm that is used for both regression and classification problems. It identifies a group of knearest observations to a given test point, reducing the impact of outliers in the training dataset. For regression, the mean value is calculated, while for classification, the majority value is determined. This study proposes a novel ensemble approach that constructs k-NN models using bootstrap samples from the training data and a randomly selected subset of features. Stepwise logistic regression is then applied to the nearest neighbours identified by each k-NN model to estimate the test observations. The final estimation for the test point's response is made through a majority voting approach using the estimates from different k-NN models. The performance of the proposed method is compared to other methods using five benchmark datasets, using Brier score, sensitivity, and accuracy as performance metrics. The results indicate that the proposed ensemble method outperforms the other methods across most of the datasets. Additionally, the proposed ensemble method is used for feature selection and compared with four other feature selection methods using 9 benchmark datasets. The results demonstrate that the proposed method exhibits superior performance compared to the other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call