STfusion: Fast and Flexible Multi-NN Execution Using Spatio-Temporal Block Fusion and Memory Management

Eunjin Baek,Jangwoo Kim,Taehun Kang,Eunbok Lee

doi:10.1109/tc.2022.3218428

Abstract

To maximize the cost-effectiveness of neural network (NN) accelerators, architects are actively developing single-chip accelerators which can execute many NNs simultaneously. However, previous approaches fail to achieve full performance potential by exploiting only spatial or temporal resource sharing (SS or TS). They also do not consider memory management that can significantly affect performance. This limitation leads to the dire need for a new multi-NN accelerator taking both opportunities with careful memory management. But, it is extremely challenging to design an ideal spatio-temporal sharing accelerator because it requires (1) an algorithm that determines the degree of SS/TS in large exploration spaces, (2) a new STS-enabled accelerator devised with diverse design points, and (3) carefully-designed memory management that minimizes resource contention during numerous data transfers upon reconfiguration. To this end, we propose STfusion, a fast and flexible multi-NN execution architecture. First, STfusion partitions an accelerator into multiple smaller TS-enabled accelerators. Second, STfusion dynamically fuses small accelerators to adjust the accelerator sizes. Third, STfusion manages on-chip buffer in a page-granularity for stall-free data transfers. Lastly, STfusion provides an algorithm that determines the degree of SS/TS to achieve high throughput while satisfying QoS goals. Our evaluation shows that STfusion significantly outperforms state-of-the-art multi-NN accelerators.

Full Text