Policy Gradient-Based Core Placement Optimization for Multichip Many-Core Systems.

Wooshik Myung,Donghyun Lee,Chenhang Song,Guanrui Wang,Cheng Ma

doi:10.1109/tnnls.2021.3117878

Wooshik Myung, Donghyun Lee + Show 3 more

https://doi.org/10.1109/tnnls.2021.3117878

Copy DOI

Export

Save

Cite

Journal: IEEE transactions on neural networks and learning systems	Publication Date: Aug 1, 2023
Citations: 2	License type: publisher-specific, author manuscript

Affiliation: Tsinghua University

Abstract
Full-Text
Similar Papers

Abstract

Listen

As many deep neural network models become deeper and more complex, processing devices with stronger computing performance and communication capability are required. Following this trend, the dependence on multichip many-core systems that have high parallelism and reasonable transmission costs is on the rise. In this work, in order to improve routing performance of the system, such as routing runtime and power consumption, we propose a reinforcement learning (RL)- based core placement optimization approach, considering application constraints, such as deadlock caused by multicast paths. We leverage the capability of deep RL from indirect supervision as a direct nonlinear optimizer, and the parameters of the policy network are updated by proximal policy optimization. We treat the routing topology as a network graph, so we utilize a graph convolutional network to embed the features into the policy network. One step size environment is designed, so all cores are placed simultaneously. To handle large dimensional action space, we use continuous values matching with the number of cores as the output of the policy network and discretize them again for obtaining the new placement. For multichip system mapping, we developed a community detection algorithm. We use several datasets of multilayer perceptron and convolutional neural networks to evaluate our agent. We compare the optimal results obtained by our agent with other baselines under different multicast conditions. Our approach achieves a significant reduction of routing runtime, communication cost, and average traffic load, along with deadlock-free performance for inner chip data transmission. The traffic of interchip routing is also significantly reduced after integrating the community detection algorithm to our agent.

Full Text