Efficient Machine Learning execution with Near-Data Processing

Aline S Cordeiro,Sairo R Dos Santos,Francis B Moreira,Paulo C Santos,Luigi Carro,Marco A.Z Alves

doi:10.1016/j.micpro.2022.104435

Abstract

A myriad of Machine Learning (ML) algorithms has emerged as a basis for many applications due to the facility in obtaining satisfactory solutions to a wide range of problems. Programs such as K-Nearest Neighbors (KNN), Multi-layer Perceptron (MLP), and Convolutional Neural Network (CNN) are commonly applied on Artificial Intelligence (AI) to process and analyze the ever-increasing amount of data. Nowadays, multi-core general-purpose systems are adopted due to their high processing capacity. However, energy consumption tends to scale together with the number of used cores. In order to achieve performance and energy efficiency, many Near-Data Processing (NDP) architectures have been proposed mainly tackling data movement reduction by placing computing units as close as possible to the data, such as specific accelerators, full-stack General Purpose Processor (GPP) and Graphics Processing Unit (GPU), and vector units. In this work, we present the benefits of exploring Vector-In-Memory Architecture (VIMA), a general-purpose vector-based NDP architecture for varied ML algorithms. Our work directly compares VIMA and x86 multi-core AVX-512 GPP. The presented approach can overcome the x86 single-core in up to 11.3× while improving energy efficiency by up to 8×. Also, our study shows that nearly 16 cores are necessary to match the NDP’s single-thread performance for KNN and MLP algorithms, while it is necessary 32 cores for convolution. Nevertheless, VIMA still overcomes x86 32 cores by 2.1× on average when considering energy results.

Full Text