Abstract

Coarse-grained reconfigurable architecture (CGRA) is a promising architecture with high performance, high power-efficiency and attraction of flexibility. The computation-intensive parts of an application (e.g., loops) are often mapped on CGRA for acceleration. Due to the high parallel data access demands, the architecture with multi-bank memory is proposed to improve parallelism. For CGRA with multi-bank memory, a joint solution, which simultaneously considers the memory partitioning and modulo scheduling, is proposed to achieve a valid mapping with better performance. In this solution, the modulo scheduling and operator scheduling are used to achieve a valid loop mapping and a valid data placement without any memory access conflicts. By avoiding the pipelining stalls caused by conflicts, the performance of loop mapping is greatly improved. The experimental results on benchmarks of the Livermore, Polybench and Mediabench show that our approach can improve the performance of loops on CGRA to 1.89 $\times$ , 1.49 $\times$ and 1.37 $\times$ compared with REGIMap, HTDM and REGIMap with memory partitioning, at cost of an acceptable increase in compilation time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call