Research interest and industry investment in edge computing solutions have increased dramatically in recent years. Consequent quest for balanced performance, energy efficiency and flexibility bestowed surging popularity on Coarse Grained Reconfigurable Array (CGRA) architectures. To further improve the performance and energy efficiency, several hardware and software-based loop optimizations are adopted for CGRAs. In this paper, we propose a centralized hardware-based loop optimization technique to achieve better area and energy results compared to the previously implemented distributed version. Without incurring any performance degradation, area overhead against the reference architecture is reduced down to \(1.5\%\) for a 4\(\times\)2 CGRA configuration. A maximum of \(47.3\%\) and an arithmetic mean of \(27.2\%\) reduction in energy consumption is attained by the centralized version of hardware loop compared to the baseline model employing software loop. Furthermore, the paper explores the co-existence of CGRA-specific hardware and software optimizations and their impact on loop efficiencies. Enhanced results are obtained by coupling loop unrolling with centralized hardware loop support. The combination allows achieving up to \(68.7\%\) reduction in energy consumption and 5.46\(\times\) speed-up against the baseline model with no optimizations applied.
Read full abstract