Attention-Based Activation Pruning to Reduce Data Movement in Real-Time AI: A Case-Study on Local Motion Planning in Autonomous Vehicles

Kruttidipta Samal,Marilyn Wolf,Saibal Mukhopadhyay

doi:10.1109/jetcas.2020.3015889

Kruttidipta Samal, Marilyn Wolf + Show 1 more

https://doi.org/10.1109/jetcas.2020.3015889

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In state-of-the-art deep neural network (DNN), the layer-wise activation maps leads to significant data movement in hardware accelerators operating on real-time streaming inputs. We explore an architecture-aware algorithmic approach to reduce data movement and the resulting latency and power consumption. This article presents an attention-based feedback for controlling input data, referred to as the activation pruning, that reduces activation maps in early layers of a DNN network which are critical for reducing data movement in real-time AI processing. The proposed approach is demonstrated for coupling RGB and Lidar images to perform real-time perception and local motion planning in autonomous systems. Lidar data is used to determine “Pixels of Interest”( PoI ) in an RGB image depending on their distance from sensor, prune the RGB image to perform object detection only within the PoI , and use the detected objects to perform local motion planning. Experiments on sequences from KITTI dataset shows the activation pruning maintains quality of motion planning while increasing the sparsity of the activation maps. The sparsity-aware computing architectures is considered to leverage activation sparsity for improved performance. The simulation results show that proposed activation pruning algorithm reduces data movement (38.5%), computational load (30.1%), and memory latency (76.3%) in sparsity-aware compute architecture, leading to faster perception and lower energy consumption.

Full Text