Abstract

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

Highlights

  • In the context of the Internet of Things (IoT), deep learning has emerged as a valuable tool

  • Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution

  • We propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers

Read more

Summary

Introduction

In the context of the Internet of Things (IoT), deep learning has emerged as a valuable tool. In many existing IoT applications, a large number of edge devices are available and connected with each other via some local network, for example, a cluster of surveillance cameras This means that many existing IoT installations already have the required system architecture for performing distributed inference. Optimizing the partitioning schemes with layer fusion leads to an up to 28.8% reduced communication demand, while executing the inference task on six edge devices with 100 Mbit/s connections for four evaluated CNNs: YOLOv2, AlexNet, VGG-16 and a GoogLeNet derivative. This results in a run time speed-up by up to 1.52x compared to a straight-forward layer partitioning. Compared to the hand-picked configuration of YOLOv2 in [29], the per-device memory footprint could be further reduced by 25%

Related Work
Background on CNNs
Fully-Connected Layers
Convolutional Layers
Distributed Inference
Partitioning of Feature-dominated Layers
Partitioning of Weight-dominated Layers
Partitioning of Convolutional Layers
ILP-based Optimization of Partitioning Decisions
ILP-based Memory Footprint Minimization
ILP-Based Communication Optimization for Weight Partitioned Layers
Experimental Evaluation
Evaluation of the ILP-based Optimization Methods
Evaluation of ILP-based Memory Footprint Minimization
Evaluation of ILP-based Communication Optimization
Evaluation on Raspberry Pi Edge Cluster
Summary and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call