Abstract

In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition, image processing, and natural language understanding. One of the significant contributors to the success of DL is the proliferation of end devices that act as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference still remains the biggest challenge. Moreover, most of the time central cloud servers are used for such computation, thus opening up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers (a mesh of computing devices near end devices). Recently, the confluence point of DL and edge has given rise to edge intelligence (EI), defined by the International Electrotechnical Commission (IEC) as the concept where the data is acquired, stored, and processed utilizing edge computing with DL and advanced networking capabilities. Broadly, EI has six levels of categories based on the three locations where the training and inference of DL take place, e.g., cloud server, edge server, and end devices. This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important such as in mission-critical applications (e.g., health care). Besides, 5G/6G networks are envisioned to use all in-edge. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism, data parallelism, and split learning, which facilitates DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call