Hardware-friendly User-specific Machine Learning for Edge Devices

Vidushi Goyal,Reetuparna Das,Valeria Bertacco

doi:10.1145/3524125

Abstract

Machine learning (ML) on resource-constrained edge devices is expensive and often requires offloading computation to the cloud, which may compromise the privacy of user data. In contrast, the type of data processed at edge devices is user-specific and limited to a few inference classes. In this work, we explore building smaller, user-specific machine learning models, rather than utilizing a generic, compute-intensive machine learning model that caters to a diverse range of users. We first present a hardware-friendly, lightweight pruning technique to create user-specific models directly on mobile platforms, while simultaneously executing inferences. The proposed technique leverages compute sharing between pruning and inference, customizes the backward pass of training, and chooses a pruning granularity for efficient processing on edge. We then propose architectural support to prune user-specific models on a systolic edge ML inference accelerator. We demonstrate that user-specific models provide a speedup of 2.9× and 2.3× on the mobile CPUs for the ResNet-50 and Inception-V3 models.

Full Text