GPU-based embedded edge server configuration and offloading for a neural network service

Joohwan Kim,Deok-Hwan Kim,Shan Ullah

doi:10.1007/s11227-021-03623-9

Abstract

Recently, emerging edge computing technology has been proposed as a new paradigm that compensates for the disadvantages of the current cloud computing. In particular, edge computing is used for service applications with low latency while using local data. For this emerging technology, a neural network approach is required to run large-scale machine learning on edge servers. In this paper, we propose a pod allocation method by adding various graphics processing unit (GPU) resources to increase the efficiency of a Kubernetes-based edge server configuration using a GPU-based embedded board and a TensorFlow-based neural network service application. As a result of experiments performed on the proposed edge server, the following are inferred: 1) The bandwidth, according to the time and data size, changes in local (20.4–42.4 Mbps) and Internet environments (6.31–25.5 Mbps) for service applications. 2) When two neural network applications are run on an edge server consisted with Xavier, TX2 and Nano, the network times of the object detection application are from 112.2 ms (Xavier) to 515.8 ms (Nano); the network times of the driver profiling application are from 321.8 ms (Xavier) to 495.7 ms (Nano). 3) The proposed pod allocation method demonstrates better performance than the default pod allocation method. We observe that the number of allocatable pods on three worker nodes increases from five to seven, and compared to other papers, the proposed offloading shows similar or better response times in environments where multiple deep learning applications are implemented.

Full Text