Abstract

Acceleration of deep neural network (DNN) inference has gained increasing attention recently with the wide adoption of DNNs for practical applications. For computer vision tasks where inputs are images, existing works mostly focus on improving the throughput of inference for multiple images. However, in many real-time applications, it is critical to reduce the latency of a single image inference, which is more complicated than improving the throughput because of the inherent data dependencies. On the other hand, from human brain's perspective, the complexity in our visual surroundings is first encoded as a pattern of light on a two dimensional array of photoreceptors, with little direct resemblance to the original input or the ultimate percept. Within just a few hundred microns of retinal thickness, this initial signal encoded by our photoreceptors must be transformed into an adequate representation of the entire visual scene. Inspired by how the retina helps human brain incept new information efficiently, we present an end-to-end structured framework built using any existing convolutional neural network (CNN) as the backbone. The proposed framework, called VisualNet, can create task parallelism for the backbone during the inference of a single image. Experiments using a number of neural networks for the ImageNet classification task and the CIFAR-10 classification task on GPUs and CPUs show that the proposed VisualNet reduces the latency of the regular network it builds on by up to 80.6% when both are fully parallelized with state-of-the-art acceleration libraries. At the same time, VisualNet can achieve similar or slightly higher accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call