On-the-fly healing of race conditions in ARINC-653 flight software

Ok-Kyoon Ha,Guy Martin Tchamgoue,Yong-Kee Jun,Jeong-Bae Suh

doi:10.1109/dasc.2010.5655315

Abstract

The ARINC-653 standard architecture for flight software specifies an application executive (APEX) which provides an application programming interface and defines a hierarchical framework which provides health management for error detection and recovery. In every partition of the architecture, however, asynchronously concurrent processes or threads may include concurrency bugs such as unintended race conditions which are common and difficult to remove by testing. A race condition toward a shared data, or data race, is a pair of unsynchronized instructions that access a shared variable with at least one write access. Data races threaten the reliability of shared-memory programs seriously and latently, because they result in unintended nondeterministic executions of the programs. To heal data race during executions of ARINC-653 flight software, this paper instruments on-the-fly race detection into the target program and incorporates on-the-fly race healing into the health management of the ARINC-653 architecture. The race detection signals to the health monitor using the corresponding APEX call, if a data race is detected. The health monitor then responds by invoking an aperiodic, user-defined, error handling process that is assigned the highest possible priority. This special process uses an APEX call to identify and then heals the occurrence of race condition as an application error, one of seven error types defined by ARINC-653. This race-healing process allows the target programs to be assured at run-time that the execution result of the healed program could have been in the original program and therefore no new functional bug has been introduced. This paper evaluates efficiencies of the on-the-fly mechanisms to argue that they are practical to be configured under the ARINC-653 partitions.

Full Text