In this paper, we consider the operation of a thermal-aware task scheduler, dispatching tasks from an arrival queue as well as setting the voltage and frequency of the processing cores to optimize the mean temperature margin of the entire chip (i.e., cores as well as the NoC routers). We model the decision process of the task scheduler as a semi-Markov decision problem (SMDP) to account for the most common uncertainties prevalent in MPSoC systems (including: the stochastic nature of the workload inter-arrival times, time-varying workload characteristics, the uncertain chip thermal profile as well as random inter-task communications). SMDP is among the fairly general variants of continuous-time optimization frameworks from the stochastic control theory and is a much more efficient choice compared to discrete-time formalisms (which would lead to an increase in task waiting times and degraded system performance). To solve the formulated SMDP, we propose two reinforcement learning (RL) algorithms that are capable of computing the optimal task assignment policy without requiring the statistical knowledge of the stochastic dynamics underlying the system states. The proposed algorithms also rely on function approximation techniques to handle the infinite length of the task queue as well as the continuous nature of temperature readings. Compared to previous work, the simulations demonstrate nearly 6 Kelvin reduction in system average peak temperature and 66 milliseconds decrease in mean task service time.
Read full abstract