Deterministic execution of multithreaded applications for reliability of multicore systems

Hamid Mushtaq

doi:10.4233/uuid:2b48115c-19b0-4301-bcbe-b3e8ce6aacba

Abstract

Constant reduction in the size of transistors has made it possible to implement many cores on a single die. However, smaller transistors are more susceptible to both temporary and permanent faults. To make such systems more reliable, online fault tolerance techniques can be applied. A common approach for providing fault tolerance is to per- form redundant execution of the software. This is done by using the program replication approach. In this approach, the replicated copies of a program (known as replicas) fol- low the same execution sequence and produce the same output if given the same input. This requirement necessitates that the replicas handle non-deterministic events such as asynchronous signals and non-deterministic functions deterministically. This is usually done by having one replica log the non-deterministic events and have the other replicas replay them at the same point in program execution. In a shared memory multithreaded program, this also means that the replicas perform non-deterministic shared memory accesses deterministically, so that they do not diverge in the absence of faults. In this thesis, we employed two techniques for doing so, which are record/replay and deterministic multithreading. Both of our schemes are implemented using a user-level library and do not require a modi?ed kernel. Moreover, they are very portable since they do not depend upon any special hardware for deterministic execution. In addition, we compare the advantages and disadvantages of both schemes in terms of performance, memory consumption and reliability. We also showed how our techniques improve upon existing techniques in terms of performance, scalability and portability. Lastly, we implemented specialized hardware extensions to further improve the performance and scalability of deterministic multithreading. Deterministic multithreading is useful not only for fault tolerance, but also for de- bugging and testing of multithreaded applications running on a multicore system. It can be useful in reducing the time needed to calculate the worst-case-execution-time (WCET) of tasks running on multicore systems, as deterministic multithreading reduces the possible number of states a multithreaded program can reach. Finding a good WCET estimate (less pessimistic) of a real time task is much simpler if it runs on a single core processor than if it runs on a multicore processor concurrently with other tasks. This is because those tasks can share resources, such as a shared cache or a shared bus, and/or may need to concurrently read and/or write shared data. In this thesis, we show that using deterministic shared memory accesses helps in reducing the possible number of states used by the estimation algorithm and therefore reduce the WCET calculation time. Moreover, we implemented optimizations to further reduce WCET calculation time as well as to get a tighter WCET estimate, besides utilizing our specialized hardware extensions for that purpose.

Full Text