Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms

Tian Wang,Xiji Wang,Gongxuan Zhang,Kun Cao,Junlong Zhou

doi:10.1145/3394885.3431423

Abstract

Heterogeneous MPSoCs consisting of integrated CPUs and GPUs are suitable platforms for embedded applications running on hand- held devices such as smart phones. As the handheld devices are mostly powered by battery, the integrated CPU and GPU MPSoC is usually designed with an emphasis on low-power rather than performance. In this paper, we are interested in exploring a power- efficient layer mapping of convolution neural networks (CNNs) deployed on integrated CPU and GPU platforms. Specifically, we investigate the impact of layer mapping of YoloV3-Tiny (i.e., a widely-used CNN in both industry and academia) on system power consumption through numerous experiments on NVIDIA board Jetson TX2. The experimental results indicate that 1) almost all of the convolution layers are not suitable for mapping to CPU, 2) the pooling layer can be mapped to CPU for reducing power consumption, but the mapping may lead to a decrease in inference speed when the layer’s output tensor size is large, 3) the detection layer can be mapped to CPU as long as its floating-point operation scale is not too large, and 4) the channel and upsampling layers are both suitable for mapping to CPU. These observations obtained in this study can be further utilized to guide the design of power-efficient layer mapping strategies for integrated CPU and GPU platforms.

Full Text