Abstract
Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) are ubiquitously utilized in many Internet of Things applications, especially for real-time image-based analysis. In order to cope with concerns such as resiliency, privacy and near real time analysis, these models must be deployed on edge devices. Particularly for large models, the large number of parameters becomes a bottleneck for the inference process because edge devices are resource constrained, subjects to failures and/or hardware faults. New solutions to cope with these issues are required. This paper proposes a hybrid partitioning strategy, architecture and implementation (called HyPS), which identifies the best positions in the model structure to split the network structure into small partitions that fit resources constraints of edge devices noticeably by decreasing instantaneous memory needs. The generated partitions consume less memory than the original network and each partition can be processed almost separately, resulting in new ways to process CNN’s execution at the edge. Thanks to this partitioning strategy, large CNNs inference can be run without modifying the main model architecture. The proposed approach is assessed on the well-known neural network structure of VGG16 for image classification. The results of the experimental campaign show that the partitioning method allows for the successful inference of large models on devices with limited overhead and high accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have