One major problem arises when AI models are run on edge devices because these have limited processing power, battery, and time constraints. This article explores methods to improve the performance of AI models in such settings so that they operate optimally and simultaneously and provide fast and accurate results. Some methods include model compression techniques such as pruning and quantizing, which make the model small sized to make the required computations with low energy utilization and knowledge distillation. Moreover, a special concern is checking the possibility of using federated learning as one of the ways of training AI models on devices spread across a distributed network while maintaining users’ privacy and avoiding the need to transfer the data to the central server. Another approach, distributed inference, in which the computations are suitably divided between different devices, is also investigated to enhance system performance and reduce latency. The use of these techniques is described in terms of the limited capabilities inherent to devices like smartphones, IoT sensors, and autonomous systems. In this work, efforts have been made to improve the inference and model deployment in edge AI systems, which is instrumental in enhancing the end user experience and smart energy usage by bringing sophisticated scale out edge-computing solutions closer to reality through application optimized edge AI models and frameworks.
Read full abstract