Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Stanislav Protasov,Adil Mehmood Khan,Siew Ann Cheong

doi:10.1155/2021/2011738

Abstract

K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2–4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.

Highlights

Proximity graphs are a practical class of graphs with applications in multiple areas
We propose an improvement to navigable small-world (NSW) and hierarchical navigable small-world (HNSW) index data structures, which results in 2–4 times sustainable speedup on average compared to 1-NN classification baseline
Our experiments study our method from three points of view: (i) NSW graph approximate nearest neighbour search (ANNS) quality compared to other proximity graphs, (ii) Classification accuracy compared to 1-NN, (iii) Time improvement compared to baseline 1-NN classification with HNSW

Summary

Introduction

Proximity graphs are a practical class of graphs with applications in multiple areas They are used for motion planning, as rapidly exploring random trees in [1, 2] and minimum spanning trees in clustering [3]. Instance-based classification (IbC) methods store items (instances) from the training dataset as part of the classifier Unlike other methods such as decision trees and artificial neural networks, the IbC algorithms do not estimate the classifier function from the training data in advance; instead, they store training data and derive a class label from an examination of the unseen sample’s nearest neighbours at test time [4]. Such methods adopt to unseen data by extending the list of stored samples

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complexity	Publication Date: Nov 29, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity

Lead the way for us

Similar Papers

Fast and Robust Dictionary-based Classification for Image Data
Shaoning Zeng ... Wei Huang
ACM Transactions on Knowledge Discovery from Data | VOL. 15
Shaoning Zeng, et. al.Shaoning Zeng ... Wei Huang
19 May 2021
ACM Transactions on Knowledge Discovery from Data | VOL. 15

Hierarchical Subspace Learning for Dimensionality Reduction to Improve Classification Accuracy in Large Data Sets
Parisa Abdolrahim Poorheravi ... Vincent Gaudet
-
Parisa Abdolrahim Poorheravi, et. al.Parisa Abdolrahim Poorheravi ... Vincent Gaudet
01 May 2021
01 May 2021

Large data sets classification using convex–concave hull and support vector machine
Asdrúbal López Chau ... Xiaoou Li
Soft Computing | VOL. 17
Asdrúbal López Chau, et. al.Asdrúbal López Chau ... Xiaoou Li
24 Nov 2012
Soft Computing | VOL. 17

Fast Support Vector Machine Classification for Large Data Sets
Wen Yu ... Xiaoou Li
International Journal of Computational Intelligence Systems | VOL. 7
Wen Yu, et. al.Wen Yu ... Xiaoou Li
01 Jan 2014
International Journal of Computational Intelligence Systems | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity