Abstract

The K-nearest neighbour (KNN) algorithm is one of the well-known classifiers applied in various research areas. The input requirement includes a set of variables, the choice of the neighbourhood size (K) and the distance metric which are typically selected on experimenting with the data. The first two are usually decided sequentially in previous studies. This paper proposes a mixed integer linear program for simultaneous variable selection and determination of neighbourhood size which has not been expored in past research. When ties occur in the variable selection and/or the choice of K, an optimization approach helps search for the best combination of decisions for a given objective. Two distance metrics, Euclidean and Hassanat, are incorporated into the model for comparison. The proposed model adopts accuracy and recall as separate objective functions to determine the best combination of the two decisions in binary classification problems. Computational experiments are designed with ten publicly available datasets. Results from the proposed models are compared with the ensemble approach KNN using Hassanat distance. Three larger problems are also tested with the optimal K generated by the model. An effective set of variables and neighbourhood size decided by optimization can facilitate solving classification problems. Model limitations are also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call