Abstract

The ability to maintain functional and temporal correctness in the presence of faults is a key requirement in many safety-critical embedded systems. This work proposes an efficient fault recovery mechanism for real-time multiprocessor systems scheduled using a low overhead, semi-partitioned optimal proportional fair scheduling technique. We assume a system that can handle a single permanent processor fault at any time, using cold back-ups (with pre-specified activation / recovery time subsequent to the detection of a fault). As a result of the fault, the system may suffer transient overloads during such recovery periods, potentially leading to unacceptable fairness deviations and consequent rejections / early terminations of critical jobs. The proposed fault-tolerant scheduler, called Fault Tolerant Fair Scheduler (FT-FS), attempts to minimize such job terminations / rejections during recovery, by judiciously redistributing slacks accumulated by a subset of jobs, delivering more sustainable performance in the process. Experimental results reveal that the proposed FT-FS algorithm performs appreciably even under high system loads. Practical applicability of our proposed scheme has been illustrated using a case study on aircraft flight control system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call