Modelling processor reliability using LLVM compiler fault injection

Y Nezzari,C P Bridges

doi:10.1109/aero.2018.8396489

Y Nezzari, C P Bridges

https://doi.org/10.1109/aero.2018.8396489

Copy DOI

Export

Save

Cite

Publication Date: Mar 1, 2018

Citations: 3

Affiliation: University of Surrey

Abstract
Full-Text
Similar Papers

Abstract

Listen

The use of commercial of the shelf (COTS) processors is increasingly attractive for the space domain, especially with emerging high demand applications in Earth observation and communications. An order of magnitude improvement in on-board processing capability with less size, mass, and power is possible, however, COTS parts still lag in terms of reliability in the space environment. Costly protection techniques to ensure resilience to single event effects (SEEs) is required. Whilst current software reliability techniques are only capable of detecting errors, and performing partial recovery, our research offers a step change for both error detection and recovery without degradation in fault coverage. This targets modern multicore processors. We have previously shown how to create additional passes in the compiler's intermediate representation layer to automatically add differing protection codes at compile-time using the LLVM compiler framework. LLVM is supported by multiple processing architectures, and multiple high level languages — meaning it can be ported to not just space applications, but aerospace, defence, medical, and automotive. In this paper a new LLVM fault injection tool is presented to validate and measure software protection methods — either statically at compile time or dynamically at runtime for multiple errors such as silent data corruption (SDC), control/flow errors, and crashes. We use our tool to inject faults into unprotected and protected codes and make quantitative comparisons of the errors and associated statistical confidence. Our protection method shows high coverage, up to 100% for some benchmarks, and does not assume that the memory system is protected via typical TMR hardware approaches. This means that we protect all memory instructions that use read and write. Another reason for the high coverage is the inclusion of multiple data and instruction types (i32, i32∗, i1, i8, i8∗, i64, float & double, float & double pointers). This research has been implemented in two processing architectures; Intel core i5-3470 with 3.2 GHz frequency and a Raspberry Pi 3. On the 1st processing platform the overhead was less than 15% and on the 2nd platform the overhead was less than 17%.

Full Text