Abstract

Sparsity in the weights of deep convolutional networks presents a tremendous opportunity to reduce computational requirements. In order to optimize flow of traffic systems, any viable solution must be able to operate at real-time. Existing computation frameworks do not yet realize the full potential speedup afforded by sparse neural networks. Meanwhile, the power consumption for a GPU is too great for widely distributed, embedded optimization systems. Here, the authors propose a procedure for realizing the potential of sparse convolutional kernels on CPU. After preprocessing, a code-generator creates well-optimized and deployable code. Measuring the performance of the CPU-mode Tensorflow, the GPU-mode Tensorflow and this proposed solution on two different sparse convolutional neural networks shows that the proposed solution is 2 to 5 times faster than the CPU-mode Tensorflow and costs less power than the GPU-mode Tensorflow. The runtime of the proposed solution is 0.13s per 321 × 321 RGB image on a 98% sparse network, which is 5 times faster than the CPU-mode Tensorflow.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call