Artificial Intelligence Inference in Edge

Victor C M Leung,Xu Chen,Yiwen Han,Dusit Niyato,Xueqiang Yan,Xiaofei Wang

doi:10.1007/978-981-15-6186-3_5

Abstract

In order to further improve the accuracy, DNNs become deeper and require larger-scale dataset. By this means, dramatic computation costs are introduced. Certainly, the outstanding performance of AI is inseparable from the support of high-level hardware, and it is difficult to deploy them in the edge with limited resources. Therefore, large-scale AI models are generally deployed in the cloud while end devices just send input data to the cloud and then wait for the AI inference results. However, the cloud-only inference limits the ubiquitous deployment of AI services. Specifically, it cannot guarantee the delay requirement of real-time services, e.g., real-time detection with strict latency demands. Moreover, for important data sources, data safety and privacy protection should be addressed. To deal with these issues, AI services tend to resort to edge computing. Therefore, AI models should be further customized to fit in the resource-constrained edge, while carefully treating the trade-off between the inference accuracy and the execution latency of them.

Full Text