A configurable multiplex data transfer model for asynchronous and heterogeneous FPGA accelerators on single DMA device

Zhangqin Huang,Shuo Zhang,Han Gao,Xiaobo Zhang,Shengqi Yang

doi:10.1016/j.micpro.2020.103174

Abstract

To reduce DMA utilization for multiple algorithm IPs on FPGA, a channel configurable and multiplex DMA device (CMDMA) is proposed for asynchronous and heterogeneous algorithm IPs. Firstly, we abstract the entities and data-flow in CMDMA system with a formal description for function definition and work-flow analysis. Then based on the functions and work-flow, we design and implement a prototype of CMDMA, which includes CMDMA software driver (SW) and hardware circuits (HW) of one DMA IP, a configurable input switch (CISwitch), algorithm IPs, and an asynchronous output switch (AOSwitch). The configurable function of CMDMA is implemented by CISwitch through a configuration port in HW-level, and a configurable Round-Robin (CRR) algorithm is proposed to implement channel and input data schedule in SW-level. For output, a channel distinguishable output buffer (ChnDistBuf) is proposed, which is able to deliver channel ID and data size to SW earlier than the end time of an algorithm IP. With a double interrupt coordination method of both ChnDistBuf and algorithm IPs, CMDMA is able to successively store complete output data from different algorithm IPs. With a double interrupt coordination method of both ChnDistBuf and algorithm IPs, CMDMA is able to successively store complete output data from different algorithm IPs. The experiments based on 4 heterogeneous matrix multiplication algorithm IPs on Xilinx Zynq platform show that CMDMA is able to improve about 8%-29% average algorithm acceleration rates on single algorithm IP compared to the exclusive method that one DMA works for one algorithm IP only, and it is able to increase about 10–40 MB/s and 5–15 MB/s of DMA input and output data throughput with multiple algorithm IPs running in parallel. Moreover, the extended LUT and FF resources in CMDMA are 756 and 1219, both of which are about 1% of Zynq platform. Besides, in a double CNN algorithm IPs test on Mnist application, an enhanced function of data broadcasting in CMDMA is able to improve 4 s than the system with 4 exclusive DMA running in parallel, meanwhile reduce 3 DMA utilization and 0.03 W power consumption.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A configurable multiplex data transfer model for asynchronous and heterogeneous FPGA accelerators on single DMA device

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems

Lead the way for us

Similar Papers

Distributed Sparse Total Least-Squares Over Networks
Songyan Huang ... Chunguang Li
IEEE Transactions on Signal Processing | VOL. 63
Songyan Huang, et. al.Songyan Huang ... Chunguang Li
01 Jun 2015
IEEE Transactions on Signal Processing | VOL. 63

Estimation of Frequencies in the Input and Output of Nonlinear Systems
Zaiyue Yang ... Zhiwei Chan
-
Zaiyue Yang, et. al.Zaiyue Yang ... Zhiwei Chan
01 Aug 2006
01 Aug 2006

DATA ENVELOPMENT ANALYSIS WITH MISSING DATA: AN EXPECTATION MAXIMIZATION APPROACH
Talat Senel ... Yuksel Terzi
PONTE International Scientific Researchs Journal | VOL. 72
Talat Senel, et. al.Talat Senel ... Yuksel Terzi
01 Jan 2015
PONTE International Scientific Researchs Journal | VOL. 72

IFM target 2.0: an innovative method to define reliability target for prototype systems
G Di Bona ... A Silvestri
The International Journal of Advanced Manufacturing Technology | VOL. 95
G Di Bona, et. al.G Di Bona ... A Silvestri
09 Dec 2017
The International Journal of Advanced Manufacturing Technology | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A configurable multiplex data transfer model for asynchronous and heterogeneous FPGA accelerators on single DMA device

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems