Abstract

In today’s world different data sets are available on which regression or classification algorithms of machine learning are applied. One of the classification algorithms is k-nearest neighbor (kNN) which computes distance amongst various rows in a dataset. The performance of kNN is evaluated based on K-value and distance metric used, where K is the total count of neighboring elements. Many different distance metrics have been used by researchers in literature, one of them is Canberra distance metric. In this paper the performance of kNN based on Canberra distance metric is measured on different datasets, further the proposed Canberra distance metric, namely, Modified Euclidean-Canberra Blend Distance (MECBD) metric has been applied to the kNN algorithm which led to improvement of class prediction efficiency on the same datasets measured in terms of accuracy, precision, recall, F1-score for different values of k. Further, this study depicts that MECBD metric use led to improvement in accuracy value 80.4% to 90.3%, 80.6% to 85.4% and 70.0% to 77.0% for various data sets used. Also, implementation of ROC curves and auc for k= 5 is done to show the improvement is kNN model prediction which showed increase in auc values for different data sets, for instance increase in auc values from 0.873 to 0.958 for Spine (2 Classes) dataset, 0.857 to 0.940, 0.983 to 0.983 (no change), 0.910 to 0.957 for DH, SL and NO class for Spine (3 Classes) data set and 0.651 to 0.742 for Haberman’s data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call