Abstract

In this paper, we propose a new dynamic reliability management (DRM) approach with deep reinforcement learning (DRL) for multi-core processors considering device reliability effects (hard error) and transient error of signal (soft error). The proposed method is based on a recently proposed physics-based three-phase electromigration model and an exponential soft error model that considers dynamic voltage and frequency scaling (DVFS) effects. Our work has been inspired by the recent advancements in DRL for various control and game applications. Compared with the traditional Q-learning based method, DRL has better scalability, lower memory and lower computational complexity. A large class of multi-threaded applications are used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results show that the proposed method can significantly reduces memory footprint and computational time compared to the traditional Q-learning based method. Furthermore, we show that the DRL-based DRM method can save 53.50% more energy than the Q-learning based method and 61.29% more than the simple DVFS based method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.