Abstract

K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2–4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.

Highlights

  • Proximity graphs are a practical class of graphs with applications in multiple areas

  • We propose an improvement to navigable small-world (NSW) and hierarchical navigable small-world (HNSW) index data structures, which results in 2–4 times sustainable speedup on average compared to 1-NN classification baseline

  • Our experiments study our method from three points of view: (i) NSW graph approximate nearest neighbour search (ANNS) quality compared to other proximity graphs, (ii) Classification accuracy compared to 1-NN, (iii) Time improvement compared to baseline 1-NN classification with HNSW

Read more

Summary

Introduction

Proximity graphs are a practical class of graphs with applications in multiple areas They are used for motion planning, as rapidly exploring random trees in [1, 2] and minimum spanning trees in clustering [3]. Instance-based classification (IbC) methods store items (instances) from the training dataset as part of the classifier Unlike other methods such as decision trees and artificial neural networks, the IbC algorithms do not estimate the classifier function from the training data in advance; instead, they store training data and derive a class label from an examination of the unseen sample’s nearest neighbours at test time [4]. Such methods adopt to unseen data by extending the list of stored samples

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.