Weight Separation for Memory-Efficient and Accurate Deep Multitask Learning

Seulki Lee,Shahriar Nirjon

doi:10.1109/percom53586.2022.9762400

Abstract

We propose a new concept called Weight Separation of deep neural networks (DNNs), which enables memory-efficient and accurate deep multitask learning on a memory-constrained embedded system. The goal of weight separation is to achieve extreme packing of multiple heterogeneous DNNs into the limited memory of the system while ensuring the prediction accuracy of the constituent DNNs at the same time. The proposed approach separates the DNN weights into two types of weight-pages consisting of a subset of weight parameters, i.e., shared and exclusive weight-pages. It optimally distributes the weight-pages into two levels of the system memory hierarchy and stores them separately, i.e., the shared weight-pages in primary (level-1) memory (e.g., RAM) and the exclusive weight-pages in secondary (level-2) memory (e.g., flask disk or SSD). First, to reduce the memory usage of multiple DNNs, less critical weight parameters are identified and overlapped onto the shared weight-pages that are deployed in the limited space of the primary (main) memory. Next, to retain the prediction accuracy of multiple DNNs, the essential weight parameters that play a critical role in preserving prediction accuracy are stored intact in the plentiful space of secondary memory storage in the form of exclusive weight-pages without overlapping. We implement two real systems applying the proposed weight separation: 1) a microcontroller-based multitask IoT system that performs multitask learning of 10 scaled-down DNNs by separating the weight parameters into FRAM and flash disk, and 2) an embedded GPU system that performs multitask learning of 10 state-of-the-art DNNs, separating the weight parameters into GPU RAM and eMMC. Our evaluation shows that memory efficiency, prediction accuracy, and execution time of deep multitask learning improve up to 5.9x, 2.0%, and 13.1x, respectively, without any modification of DNN models.

Full Text