Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

Wonsub Kim,Haewoo Park,Yoonseo Choi

doi:10.1145/2541228.2555314

Abstract

Coarse-Grained Reconfigurable Architectures (CGRAs) present a potential of high compute throughput with energy efficiency. A CGRA consists of an array of Functional Units (FUs), which communicate with each other through an interconnect network containing transmission nodes and register files. To achieve high performance from the software solutions mapped onto CGRAs, modulo scheduling of loops is generally employed. One of the key challenges in modulo scheduling for CGRAs is to explicitly handle routings of operands from a source to a destination operations through various routing resources. Existing modulo schedulers for CGRAs are slow because finding a valid routing is generally a searching problem over a large space, even with the guidance of well-defined cost metrics. Applications in traditional embedded multimedia domains are regarded as relatively tolerant to a slow compile time in exchange for a high-quality solution. However, many rapidly growing domains of applications, such as 3D graphics, require a fast compilation. Entrances of CGRAs to these domains have been blocked mainly due to their long compile time. We attack this problem by utilizing patternized routes, for which resources and time slots for a success can be estimated in advance when a source operation is placed. By conservatively reserving predefined resources at predefined time slots, future routings originating from the source operation are guaranteed. Experiments on a real-world 3D graphics benchmark suite show that our scheduler improves the compile time up to 6,000 times while achieving an average 70% throughputs of the state-of-the-art CGRA modulo scheduler, the Edge-centric Modulo Scheduler (EMS).

Full Text