Abstract

This paper studies the problem of maximizing multicore system lifetime reliability, an important design consideration for many real-time embedded systems. Existing work has investigated the problem, but has neglected important failure mechanisms. Furthermore, most existing algorithms are too slow for online use, and thus cannot address runtime workload and environment variations. This paper presents an online framework that maximizes system lifetime reliability through reliability-aware utilization control. It focuses on homogeneous multicore soft real-time systems. It selectively employs a comprehensive reliability estimation tool to deal with a variety of failure mechanisms at the system level. A model-predictive controller adjusts utilization by manipulating core frequencies, thereby reducing temperature, and an online heuristic adjusts the controller sampling window length to decrease the reliability effects of thermal cycling. Experiments with a real quad-core ARM processor and a simulator demonstrate that the proposed approach improves system mean time to failure by 50% on average and 141% in the best case, compared with existing techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call