A dynamically configurable coprocessor for convolutional neural networks

Srimat Chakradhar,Murugan Sankaradas,Venkata Jakkula,Srihari Cadambi

doi:10.1145/1816038.1815993

Abstract

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of three types of parallelism : parallelism within a convolution operation, intra-output parallelism where multiple input sources (features) are combined to create a single output, and inter-output parallelism where multiple, independent outputs (features) are computed simultaneously. Workloads differ significantly across different CNN applications, and across different layers of a CNN. Second, the number of processing elements in an architecture continues to scale (as per Moore's law) much faster than off-chip memory bandwidth (or pin-count) of chips. Based on these two observations, we show that for a given number of processing elements and off-chip memory bandwidth, a new CNN hardware architecture that dynamically configures the hardware on-the-fly to match the specific mix of parallelism in a given workload gives the best throughput performance. Our CNN compiler automatically translates high abstraction network specification into a parallel microprogram (a sequence of low-level VLIW instructions) that is mapped, scheduled and executed by the coprocessor. Compared to a 2.3 GHz quad-core, dual socket Intel Xeon, 1.35 GHz C870 GPU, and a 200 MHz FPGA implementation, our 120 MHz dynamically configurable architecture is 4x to 8x faster. This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A dynamically configurable coprocessor for convolutional neural networks

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News

Lead the way for us

Journal: ACM SIGARCH Computer Architecture News	Publication Date: Jun 19, 2010
Citations: 136

Similar Papers

A dynamically configurable coprocessor for convolutional neural networks
Srimat Chakradhar ... Murugan Sankaradas
-
Srimat Chakradhar, et. al.Srimat Chakradhar ... Murugan Sankaradas
19 Jun 2010
19 Jun 2010

Application of convolutional neural networks in image classification and applications of improved convolutional neural networks
Taoyu Liu
Applied and Computational Engineering | VOL. 81
Taoyu LiuTaoyu Liu
08 Nov 2024
Applied and Computational Engineering | VOL. 81

A survey of the recent architectures of deep convolutional neural networks
Asifullah Khan ... Umme Zahoora
Artificial Intelligence Review | VOL. 53
Asifullah Khan, et. al.Asifullah Khan ... Umme Zahoora
21 Apr 2020
Artificial Intelligence Review | VOL. 53

Convolutional Neural Network and Its Advances: Overview and Applications
Jyoti S Raghatwan ... Sandhya Arora
-
Jyoti S Raghatwan, et. al.Jyoti S Raghatwan ... Sandhya Arora
20 Sep 2021
20 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A dynamically configurable coprocessor for convolutional neural networks

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News