DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Rafael Stahl,Andreas Gerstlauer,Daniel Mueller-Gritschneder,Ulf Schlichtmann,Alexander Hoffman

doi:10.1007/s10766-021-00712-3

Abstract

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

Highlights

In the context of the Internet of Things (IoT), deep learning has emerged as a valuable tool
Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution
We propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers

Summary

Introduction

In the context of the Internet of Things (IoT), deep learning has emerged as a valuable tool. In many existing IoT applications, a large number of edge devices are available and connected with each other via some local network, for example, a cluster of surveillance cameras This means that many existing IoT installations already have the required system architecture for performing distributed inference. Optimizing the partitioning schemes with layer fusion leads to an up to 28.8% reduced communication demand, while executing the inference task on six edge devices with 100 Mbit/s connections for four evaluated CNNs: YOLOv2, AlexNet, VGG-16 and a GoogLeNet derivative. This results in a run time speed-up by up to 1.52x compared to a straight-forward layer partitioning. Compared to the hand-picked configuration of YOLOv2 in [29], the per-device memory footprint could be further reduced by 25%

Related Work

Background on CNNs

Fully-Connected Layers

Convolutional Layers

Distributed Inference

Partitioning of Feature-dominated Layers

Partitioning of Weight-dominated Layers

Partitioning of Convolutional Layers

ILP-based Optimization of Partitioning Decisions

ILP-based Memory Footprint Minimization

ILP-Based Communication Optimization for Weight Partitioned Layers

Experimental Evaluation

Evaluation of the ILP-based Optimization Methods

Evaluation of ILP-based Memory Footprint Minimization

Evaluation of ILP-based Communication Optimization

Evaluation on Raspberry Pi Edge Cluster

Summary and Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Parallel Programming	Publication Date: Apr 7, 2021
Citations: 27	License type: open-access

R Discovery Prime

R Discovery Prime

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming

Lead the way for us

Similar Papers

Automated Exploration and Implementation of Distributed CNN Inference at the Edge
Xiaotian Guo ... Andy D Pimentel
IEEE Internet of Things Journal | VOL. 10
Xiaotian Guo, et. al.Xiaotian Guo ... Andy D Pimentel
01 Apr 2023
IEEE Internet of Things Journal | VOL. 10

Edge-Assisted CNN Inference over Encrypted Data for Internet of Things
Yifan Tian ... Yantian Hou
-
Yifan Tian, et. al.Yifan Tian ... Yantian Hou
01 Jan 2019
01 Jan 2019

An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs
Jiang Zhu ... Haolin Liu
IEEE Access | VOL. 8
Jiang Zhu, et. al.Jiang Zhu ... Haolin Liu
01 Jan 2020
IEEE Access | VOL. 8

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
Xiang Yang ... Qi Qi
IEEE Transactions on Mobile Computing | VOL. 23
Xiang Yang, et. al.Xiang Yang ... Qi Qi
01 Apr 2024
IEEE Transactions on Mobile Computing | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming