Optimizing Machine Learning Algorithms on Multi-Core and Many-Core Architectures Using Thread and Data Mapping

Matheus S Serpa,Pascal Felber,Arthur M Krause,Philippe Olivier Alexandre Navaux,Eduardo H.M Cruz,Marcelo Pasin

doi:10.1109/pdp2018.2018.00058

Abstract

Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. The algorithms at the core of machine learning are notoriously demanding in terms of resources. It is therefore of paramount importance to optimize their operation on modern processors. Several approaches have been proposed to accelerate machine learning on GPUs and massively parallel computers, as well as dedicated ASICs. In this paper, we focus on Intel's multi-core Xeon and many-core accelerator Xeon Phi Knights Landing, which can host several hundreds of threads on the same CPU. In such architectures, thread and data mapping are keys for performance. We study the impact of mapping strategies, revealing that, with smart mapping policies, one can indeed significantly speed up machine learning applications on many-core architectures. Execution time was reduced by up to 25.2% and 18.5% on Intel Xeon and Xeon Phi KNL, respectively.

Full Text