Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-specific accelerators due to their efficiency and flexibility. A CGRA typically relies on compilers to perform task scheduling. The longstanding problem of static scheduling is that it suffers from insufficient parallelism in handling irregularities due to over-serialization and workload imbalance, which leads to severe resource underutilization and performance loss. To counteract the limitations of static scheduling in CGRAs, it is essential to exploit dynamic parallelism automatically and manage hardware resources adaptively. However, existing dynamic scheduling mechanisms, e.g., work stealing, often reschedule aggressively for instant performance but sacrifice efficiency, which is unfavorable to CGRAs that emphasize efficiency and fewer reconfigurations. This article proposes an elastic task scheduling scheme that enables lightweight dynamic scheduling in CGRAs. Tasks are rescheduled at runtime according to the classic tagged-token dataflow paradigm to enable dynamic task-level parallelism. Meanwhile, tasks are dynamically resized according to run-time throughputs via duplication, combination, and substitution operators for balanced multitask execution. We implement the elastic task scheduling scheme on a well-known reconfigurable architecture - triggered instruction architecture (TIA). Evaluation on the MachSuite benchmarks shows that the proposed scheme is effective in improving performance and energy efficiency. The average speedup is 2× over the baseline. Also, our design attains a 57 percent improvement in the area-normalized performance and a 49 percent better energy efficiency. Compared with a state-of-the-art dynamic scheduling method, our scheme achieves 1.6× speedup and 1.6× energy efficiency than work-stealing mechanism on the same substrate.
Read full abstract