Abstract

Training a large set of data takes GPU days using Deep convolution neural networks which are a time taking process. Self-driving cars require very low latency for pedestrian detection. Image recognition constrained by limited processing resources for mobile phones. The computation speed of the training set determines that in these situations convolution neural networks was a success. For large filters, Conventional Faster Fourier Transform based convolution is preferably fast, yet in case of small, 3 × 3 filters state of the art convolutional neural networks is used. By using Winograd's minimal filtering algorithms the new class of fast algorithms for convolutional neural networks was introduced by us. Instead of small tiles, minimal complexity convolution was computed by the algorithms, this increases the computing speed with small batch sizes and small filters. With the VGG network, we benchmark a GPU implementation of our algorithm and at batch sizes from 1 to 64 state of the art throughput was shown.

Highlights

  • Image recognition in state of the art results [1,2] is acquired by using deep convolution neural networks

  • Convergence of the network can be affected by large batch sizes adversely, so the upper limit on the cluster size was placed with the minimum group size can be computed efficiently

  • By using at most 16mb of workspace memory, and for all batch sizes the throughput is measured by Sate of art, from 1 to 64, was achieved by NVIDIA Maxwell Graphical Processing unit (GPU)

Read more

Summary

Introduction

Image recognition in state of the art results [1,2] is acquired by using deep convolution neural networks (convnets). Several days of GPU time is taken for training in these networks and it requires significant compute resources during classification too. Likewise, when convnets are applied on low latency inference problems, such as to determine the how fast is tiny set of image data that can be determined and classified will limits the detection of people detection in autonomous cars in a video imagery. Distributed training of convnets(convolution neural networks) can be acquired and accumulating weight updates across the nodes. Convergence of the network can be affected by large batch sizes adversely, so the upper limit on the cluster size was placed with the minimum group size can be computed efficiently. By using at most 16mb of workspace memory, and for all batch sizes the throughput is measured by Sate of art , from 1 to 64, was achieved by NVIDIA Maxwell GPUs

Related Work
Faster Fourier Transformation for CNN
Logical Computing Analysis
Implementation of GPU
Experimenting on different networks
Results and Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.