Abstract

The K-nearest neighbour (KNN) algorithm is one of the well-known classifiers applied in various research areas. The input requirement includes a set of variables, the choice of the neighbourhood size (K) and the distance metric which are usually selected based on data characteristics. The first two are usually decided sequentially in previous studies. This paper proposes a mixed integer linear program for simultaneous variable selection and determination of neighbourhood size. The Euclidean distance is used but the model constraints can be adapted for other distance metrics. The proposed model adopts accuracy and recall as objective functions, respectively, to determine the best combination of the two decisions in binary classification problems. Computational experiments are designed with ten publicly available datasets. Results showed that using at least half of all variables with smaller K value can already achieve better or equally good classification accuracy and recall rates, respectively. An effective set of variables and small neighbourhood size in KNN can facilitate solving classification problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call