Abstract

The accuracy and performance requirements of deep neural network models have increased a lot due to the application of deep learning and embedded platforms, and the complexity of the network models is increasing exponentially, significantly increasing the computational volume and time of network model inference. The limited computational resources of embedded platforms eventually lead to the problem of sluggish response and high latency when the network models are deployed. To accelerate the network model inference process and reduce the impact of latency on the research, we propose to use the JTRT technique to accelerate the inference of deep neural network models. To verify the capability of the JTRT technique, we apply it to various neural network models. The experimental results show that the JTRT technique can compress various network models and perform accelerated inference using semi-precision techniques to achieve optimization acceleration of the network models. The most considerable improvement is achieved for the ResNet framework, with a 9.42-fold speedup, and the acceleration performance of JTRT is better for deeper network models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call