Abstract

This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption.

Highlights

  • Deep neural networks (DNNs) have become important applications in diverse domains encompassing autonomous driving, surveillance cameras, and a variety of Internet of Things (IoT) gadgets

  • The performance, power, energy, and thermal implications of NeuroPipe are analyzed based on hardware measurements and compared against the baseline execution as a target constraint that performs a simple host-device execution scheme

  • The native computational traits of neural networks are better manifested in hardware measurement results via the deployment of lightweight framework on the edge device

Read more

Summary

Introduction

Deep neural networks (DNNs) have become important applications in diverse domains encompassing autonomous driving, surveillance cameras, and a variety of Internet of Things (IoT) gadgets. Deploying DNN workloads on embedded systems imposes great hardware burdens on the edge devices around performance, energy, power, and thermal. It is challenging to perform realtime-constrained inference of DNNs on the resource-limited embedded edge devices in energy-efficient manner for their massive amount of operations and sizable data. To amortize the computational costs of DNNs, recent approaches seek to design lighter neural networks such as MobileNet [9], [25] and ShuffleNet [31] by engaging novel calculation methods that reduce the amount of operations (e.g., depth-wise convolution [10], point-wise group convolution [31]). Other directions include devising optimization techniques such as compression [6], pruning [7], [14], quantization [11], [29], and mixed precision [16], [26]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call