Thanks to the recognition and promotion of chiplet-based High-Performance Computing (HPC) system design technology by semiconductor industry/market leaders, chiplet-based multi-chip systems have gradually become the mainstream. Unfortunately, programming such systems to achieve efficient computing is a challenge, especially when considering dynamic task parallelism. This paper presents an Adaptive Batch-Stream Scheduling (ABSS) module for dynamic task parallelism on chiplet-based multi-chip systems. To this end, we propose an adaptive batch-stream scheduling method based on Graph Convolution Network (GCN) classifier to select the appropriate scheduling scheme. We further design a chiplet-based core-cluster binding mechanism, which establishes the affinity between threads and core-clusters on CPU-compute die. Moreover, to achieve dynamic workload balance, we propose a chiplet-based nearest task stealing method. We implement our ABSS module on the HiSilicon Kunpeng-920 chiplet-based multi-chip system. Experiments show that it outperforms state-of-the-art parallelism solutions, such as Intel Threading Building Blocks.
Read full abstract