With the widespread adoption of deep neural network approaches in mobile devices, the drawbacks of traditional cloud-based or local deployment solutions are becoming apparent. The high latency caused by cloud-based inferred neural networks and the high power consumption caused by local inference of mobile devices greatly reduce the actual user experience of neural network applications. To solve this problem, this paper proposes a hierarchical deployment and inference acceleration model for deep neural networks based on fog computing. Firstly, a search reallocation of the solution space of a deep neural network is performed, and a Solution Space Tree Pruning (SSTP) based deployment algorithm is designed to select the appropriate network layer for deployment in order to reduce the overall inferred delay of a deep neural network. Next, an algorithm for Maximizing Accuracy based on Guaranteed Latency (MAL) is designed. On the pruned solution-space tree of the SSTP algorithm, the appropriate fog computation nodes are selected for mobile devices in different geographic locations to achieve early exit from the inference task according to the operational delay and inference accuracy requirements of real device terminals. The experimental results show that the proposed solution reduces the average latency of the fog computing-based inference acceleration model by 44.79% compared to the traditional cloud-deployed deep neural network inference, and by 28.75% compared to the edge computing acceleration framework in existing studies. The model meets the minimum latency and accuracy of neural network inference in multiple fog computing scenarios. At the same time, it greatly reduces the performance occupation and case cost of the cloud under the traditional cloud computing model.
Read full abstract