Abstract

A simple AXI based convolution architecture for deep learning is presented. Input feature maps and kernel weights are stored in P K∗K memory blocks and convolution is done from output feature map 0 to M-1, and inside a feature map, output is generated in raster scan order. Data from P input feature maps are summed in parallel during convolution. It is possible to provide P K∗K input feature map data, P K∗K weights and the bias for the input and output feature maps being processed by manipulating the read addresses and read data alignment. Dual buffers are used to perform convolution for output feature map while DMA write for previous final output feature map is in progress. Correct operation was verified by comparing RTL simulation and C program run results. This method provides over 2,000 speed-up compared to pure software method and with flow control between DMA and convolution, much less memory can be used. This architecture can be used for convolution acceleration for moderate deep learning applications on embedded systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call