Abstract

In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven ${Q}$ -learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We employ ON/OFF switching actions and dynamic voltage and frequency scaling as control knobs (i.e., working modes) to tune the state of cores of the processor. In order to achieve this, our technique detects program phases and adaptively determines the optimal working modes for each phase using the ${Q}$ -learning-based method. By integrating the phase detection into the ${Q}$ -learning-based management, our technique can provide efficient management for the programs with highly diverse phases. We also propose three additional modules to improve the management efficiency of our technique. In order to evaluate our technique, we use it to manage a 3-D CPU with high-diver programs. Several failure mechanisms are considered in this case study. Our proposed technique is compared with two existing ${Q}$ -learning-based techniques. The experimental results demonstrate that when the number of phases is smaller than the number of working modes, our technique can achieve more than $1.36{ \times }$ improvement in performance with 60% memory space savings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call