Abstract

While deep neural networks have achieved superior performance in a variety of intelligent applications, the increasing computational complexity makes them difficult to be deployed on resource-constrained devices. To improve the performance of on-device inference, prior studies have explored various approximate strategies, such as neural network pruning, to optimize models based on different principles. However, when combining these approximate strategies, a large parameter space needs to be explored. Meanwhile, different configuration parameters may interfere with each other, damaging the performance optimization effect. In this paper, we propose a novel model optimization framework, CoAxNN, which effectively combines different approximate strategies, to facilitate on-device deep learning via model approximation. Based on the principles of different approximate optimizations, our approach constructs the design space and automatically finds reasonable configurations through genetic algorithm-based design space exploration. By combining the strengths of different approximation methods, CoAxNN enables efficient conditional inference for models at runtime. We evaluate our approach by leveraging state-of-the-art neural networks on a representative intelligent edge platform, Jetson AGX Orin. The experimental results demonstrate the effectiveness of CoAxNN, which achieves up to 1.53× speedup while reducing energy by up to 34.61%, with trivial accuracy loss on CIFAR-10/100 and CINIC-10 datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call