Abstract

All-Programmable System-on-Chips (APSoCs) constitute a compelling option for employing applications in radiation environments thanks to their high-performance computing and power efficiency merits. Despite these advantages, APSoCs are sensitive to radiation like any other electronic device. Processors embedded in APSoCs, therefore, have to be adequately hardened against ionizing-radiation to make them a viable choice of design for harsh environments. This paper proposes a novel lockstep-based approach to harden the dual-core ARM Cortex-A9 processor in the Xilinx Zynq-7000 APSoC against radiation-induced soft errors by coupling it with a MicroBlaze TMR subsystem in the programmable logic (PL) layer of the Zynq. The proposed technique uses the concepts of checkpointing along with roll-back and roll-forward mechanisms at the software level, i.e. software redundancy, as well as processor replication and checker circuits at the hardware level (i.e. hardware redundancy). Results of fault injection experiments show that the proposed approach achieves high levels of protection against soft errors by mitigating around 98% of bit-flips injected into the register files of both ARM cores while keeping timing performance overhead as low as 25% if block and application sizes are adjusted appropriately. Furthermore, the incorporation of the roll-forward recovery operation in addition to the roll-back operation improves the Mean Workload between Failures (MWBF) of the system by up to ≈19% depending on the nature of the running application, since the application can proceed faster, in a scenario where a fault occurs, when treated with the roll-forward operation rather than roll-back operation. Thus, relatively more data can be processed before the next error occurs in the system.

Highlights

  • Cleaning up the legacy of nuclear waste is one of Europe's most critical and complicated environmental remediation projects, which is expected to cost as much as £220bn over the 120 years [1]

  • Another conclusion can be drawn from Table 7; as more code execute blocks are employed within an application, that is to say, as the appli­ cation size grows, timing performance overheads of triple-core lockstep (TCLS) design become more favourable for the given matrix size

  • Time over­ heads associated with TCLS design may not suit some hard real-time systems, these overheads would be tolerable for many systems requiring high reliability and dependability under harsh environments once block and application sizes are appropriately adjusted through trial and error based on the nature of the given application program

Read more

Summary

Introduction

Cleaning up the legacy of nuclear waste is one of Europe's most critical and complicated environmental remediation projects, which is expected to cost as much as £220bn over the 120 years [1]. Many mission-critical applica­ tions could have been implemented in All-Programmable Systems-onChips (APSoCs) which combine programmable logic (PL) layer (i.e. SRAM-based FPGA layer) with embedded processors in the processor subsystem (PS) layer Such APSoCs enjoy the merits of higher perfor­ mance, lower energy consumption, and favourable time-to-market and cost [5]. These highly-integrated circuits, which involve a set of homogeneous or heterogeneous processor cores, are very sus­ ceptible to transient faults that might even lead to total system failures. Experiments indicate that the TCLS approach applied to the dual-core ARM Cortex-A9 processor embedded in Xilinx Zynq-7000 APSoC is capable of mitigating around 98% of the bit-flips injected while keeping the timing performance overhead as low as 25%, when certain condi­ tions are satisfied, under fault-free conditions.

Background
Radiation effects on electronics
Effects of soft errors in processors
Fault-tolerance techniques
Lockstep technique
Proposed triple-core lockstep technique
Architecture
Methodology
Interrupt implementation
Consistency check and checkpoint implementations
Fault injection technique
Implementation and experimental results
Resource consumption analysis
Timing performance analysis for matrix multiplication benchmarks
Fault-injection performance analysis for matrix-multiplication benchmarks
Fault-injection performance analysis for 256-bit AES encryption benchmarks
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call