Abstract

Recently, the necessity to run high-performance neural networks (NN) is increasing even in resource-constrained embedded systems such as wearable devices. However, due to the high computational and memory requirements of the NN applications, it is typically infeasible to execute them on a single device. Instead, it has been proposed to run a single NN application cooperatively on top of multiple devices, a so-called distributed neural network. In the distributed neural network, workloads of a single big NN application are distributed over multiple tiny devices. While the computation overhead could effectively be alleviated by this approach, the existing distributed NN techniques, such as MoDNN, still suffer from large traffics between the devices and vulnerability to communication failures. In order to get rid of such big communication overheads, a knowledge distillation based distributed NN, called Network of Neural Networks (NoNN), was proposed, which partitions the filters in the final convolutional layer of the original NN into multiple independent subsets and derives smaller NNs out of each subset. However, NoNN also has limitations in that the partitioning result may be unbalanced and it considerably compromises the correlation between filters in the original NN, which may result in an unacceptable accuracy degradation in case of communication failure. In this paper, in order to overcome these issues, we propose to enhance the partitioning strategy of NoNN in two aspects. First, we enhance the redundancy of the filters that are used to derive multiple smaller NNs by means of averaging to increase the immunity of the distributed NN to communication failure. Second, we propose a novel partitioning technique, modified from Eigenvector-based partitioning, to preserve the correlation between filters as much as possible while keeping the consistent number of filters distributed to each device. Throughout extensive experiments with the CIFAR-100 (Canadian Institute For Advanced Research-100) dataset, it has been observed that the proposed approach maintains high inference accuracy (over 70%, 1.53× improvement over the state-of-the-art approach), on average, even when a half of eight devices in a distributed NN fail to deliver their partial inference results.

Highlights

  • Resource-constrained computer systems such as Internet-of-Things (IoT) or wearable devices are becoming increasingly popular in the market

  • We identified limitations of the state-of-the-art distributed neural networks (NN), called

  • Network of Neural Networks (NoNN): their partitioning technique may result in unbalanced filters for the distributed devices and the correlation between filters cannot be considered enough during partitioning

Read more

Summary

Introduction

Resource-constrained computer systems such as Internet-of-Things (IoT) or wearable devices are becoming increasingly popular in the market. In this case, an intermediate feature map that locates in the boundary of partitioning should be transferred between devices in its entirety, resulting in considerable communication burdens. In order to get rid of such prohibitive communication overheads, Bhardwaj et al [12] proposed a completely different distribution approach, called Network of Neural Network (NoNN), based on Knowledge Distillation (KD).

Knowledge Distillation
Soft Label
Attention Transfer
Overview of NoNN
Partitioning Strategy of NoNN
Averaging
Eigenvector-Based Partitioning
Experiments and Discussion
Effects of Averaging and Eigenvector-Based Partitioning
Effects of Different Part Numbers and Sizes
Effects of Teacher Network Size
Effects of the Number of Student Networks
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call