Speeding up k-Nearest Neighbors classifier for large-scale multi-label learning on GPUs

Przemysław Skryjomski,Bartosz Krawczyk,Alberto Cano

doi:10.1016/j.neucom.2018.06.095

Przemysław Skryjomski, Bartosz Krawczyk + Show 1 more

https://doi.org/10.1016/j.neucom.2018.06.095

Copy DOI

Abstract

Multi-label classification is one of the most dynamically growing fields of machine learning, due to its numerous real-life applications in solving problems that can be described by multiple labels at the same time. While most of works in this field focus on proposing novel and accurate classification algorithms, the issue of the computational complexity on growing dataset sizes is somehow marginalized. Owning to the ever-increasing capabilities of data capturing, we are faced with the problem of large-scale data mining that forces learners to be not only highly accurate, but also fast and scalable on high-dimensional spaces of instances, features, and labels. In this paper, we propose a highly efficient parallel approach for computing the multi-label k-Nearest Neighbor classifier on GPUs. While this method is highly effective due to its accuracy and simplicity, its computational complexity makes it prohibitive for large-scale data. We propose a four-step implementation that takes an advantage of the GPU architecture, allowing for an efficient execution of the multi-label k-Nearest Neighbors classifier without any loss of accuracy. Experiments carried out on a number of real and artificial benchmarks show that we are able to achieve speedups up to 200 times when compared to a sequential CPU execution, while efficiently scaling up to varying number of instances and features.

Full Text