Abstract

As many deep neural network models become deeper and more complex, processing devices with stronger computing performance and communication capability are required. Following this trend, the dependence on multichip many-core systems that have high parallelism and reasonable transmission costs is on the rise. In this work, in order to improve routing performance of the system, such as routing runtime and power consumption, we propose a reinforcement learning (RL)- based core placement optimization approach, considering application constraints, such as deadlock caused by multicast paths. We leverage the capability of deep RL from indirect supervision as a direct nonlinear optimizer, and the parameters of the policy network are updated by proximal policy optimization. We treat the routing topology as a network graph, so we utilize a graph convolutional network to embed the features into the policy network. One step size environment is designed, so all cores are placed simultaneously. To handle large dimensional action space, we use continuous values matching with the number of cores as the output of the policy network and discretize them again for obtaining the new placement. For multichip system mapping, we developed a community detection algorithm. We use several datasets of multilayer perceptron and convolutional neural networks to evaluate our agent. We compare the optimal results obtained by our agent with other baselines under different multicast conditions. Our approach achieves a significant reduction of routing runtime, communication cost, and average traffic load, along with deadlock-free performance for inner chip data transmission. The traffic of interchip routing is also significantly reduced after integrating the community detection algorithm to our agent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call