Abstract
In state-of-the-art deep neural network (DNN), the layer-wise activation maps leads to significant data movement in hardware accelerators operating on real-time streaming inputs. We explore an architecture-aware algorithmic approach to reduce data movement and the resulting latency and power consumption. This article presents an attention-based feedback for controlling input data, referred to as the activation pruning, that reduces activation maps in early layers of a DNN network which are critical for reducing data movement in real-time AI processing. The proposed approach is demonstrated for coupling RGB and Lidar images to perform real-time perception and local motion planning in autonomous systems. Lidar data is used to determine “Pixels of Interest”( PoI ) in an RGB image depending on their distance from sensor, prune the RGB image to perform object detection only within the PoI , and use the detected objects to perform local motion planning. Experiments on sequences from KITTI dataset shows the activation pruning maintains quality of motion planning while increasing the sparsity of the activation maps. The sparsity-aware computing architectures is considered to leverage activation sparsity for improved performance. The simulation results show that proposed activation pruning algorithm reduces data movement (38.5%), computational load (30.1%), and memory latency (76.3%) in sparsity-aware compute architecture, leading to faster perception and lower energy consumption.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have