4D-CGRA: Introducing Branch Dimension to Spatio-Temporal Application Mapping on CGRAs

Manupa Karunaratne,Tulika Mitra,Li-Shiuan Peh,Dhananjaya Wijerathne

doi:10.1109/iccad45719.2019.8942148

Abstract

Coarse-Grained Reconfigurable Arrays (CGRA) are a promising class of accelerators that provide good balance between flexibility, performance, and power. As the CGRAs are designed to support dataflow, the acceleration is limited to loops with simple control flows. The compiler generates static schedules of loop kernels on the CGRA and completely eliminates the burden of resource conflict resolution from the hardware. In the presence of complex control flows, the static scheduling on CGRA requires independent resource reservations for mutually-exclusive dataflows along control-divergent paths. Such reservations are not only wasteful but also limit performance by increasing the schedule length. We introduce a novel architecture, 4D-CGRA, that encourages mutually-exclusive dataflows to map to the same set of resources but allows execution of the appropriate dataflows at runtime based on the branch outcomes. We achieve this by introducing an architecture-enabled new branch dimension corresponding to the branching decisions. We design a novel compiler to model integrated placement and routing in four dimensions (two spatial, one temporal, one branch). 4D-CGRA achieves upto 2.33x (average 1.44x) performance gain compared to a generic CGRA, with the same area, power budget.

Full Text