Abstract

Coarse-Grained Reconfigurable Arrays (CGRA) are a promising class of accelerators that provide good balance between flexibility, performance, and power. As the CGRAs are designed to support dataflow, the acceleration is limited to loops with simple control flows. The compiler generates static schedules of loop kernels on the CGRA and completely eliminates the burden of resource conflict resolution from the hardware. In the presence of complex control flows, the static scheduling on CGRA requires independent resource reservations for mutually-exclusive dataflows along control-divergent paths. Such reservations are not only wasteful but also limit performance by increasing the schedule length. We introduce a novel architecture, 4D-CGRA, that encourages mutually-exclusive dataflows to map to the same set of resources but allows execution of the appropriate dataflows at runtime based on the branch outcomes. We achieve this by introducing an architecture-enabled new branch dimension corresponding to the branching decisions. We design a novel compiler to model integrated placement and routing in four dimensions (two spatial, one temporal, one branch). 4D-CGRA achieves upto 2.33x (average 1.44x) performance gain compared to a generic CGRA, with the same area, power budget.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call