Abstract
All-Programmable System-on-Chips (APSoCs) constitute a compelling option for employing applications in radiation environments thanks to their high-performance computing and power efficiency merits. Despite these advantages, APSoCs are sensitive to radiation like any other electronic device. Processors embedded in APSoCs, therefore, have to be adequately hardened against ionizing-radiation to make them a viable choice of design for harsh environments. This paper proposes a novel lockstep-based approach to harden the dual-core ARM Cortex-A9 processor in the Xilinx Zynq-7000 APSoC against radiation-induced soft errors by coupling it with a MicroBlaze TMR subsystem in the programmable logic (PL) layer of the Zynq. The proposed technique uses the concepts of checkpointing along with roll-back and roll-forward mechanisms at the software level, i.e. software redundancy, as well as processor replication and checker circuits at the hardware level (i.e. hardware redundancy). Results of fault injection experiments show that the proposed approach achieves high levels of protection against soft errors by mitigating around 98% of bit-flips injected into the register files of both ARM cores while keeping timing performance overhead as low as 25% if block and application sizes are adjusted appropriately. Furthermore, the incorporation of the roll-forward recovery operation in addition to the roll-back operation improves the Mean Workload between Failures (MWBF) of the system by up to ≈19% depending on the nature of the running application, since the application can proceed faster, in a scenario where a fault occurs, when treated with the roll-forward operation rather than roll-back operation. Thus, relatively more data can be processed before the next error occurs in the system.
Highlights
Cleaning up the legacy of nuclear waste is one of Europe's most critical and complicated environmental remediation projects, which is expected to cost as much as £220bn over the 120 years [1]
Another conclusion can be drawn from Table 7; as more code execute blocks are employed within an application, that is to say, as the appli cation size grows, timing performance overheads of triple-core lockstep (TCLS) design become more favourable for the given matrix size
Time over heads associated with TCLS design may not suit some hard real-time systems, these overheads would be tolerable for many systems requiring high reliability and dependability under harsh environments once block and application sizes are appropriately adjusted through trial and error based on the nature of the given application program
Summary
Cleaning up the legacy of nuclear waste is one of Europe's most critical and complicated environmental remediation projects, which is expected to cost as much as £220bn over the 120 years [1]. Many mission-critical applica tions could have been implemented in All-Programmable Systems-onChips (APSoCs) which combine programmable logic (PL) layer (i.e. SRAM-based FPGA layer) with embedded processors in the processor subsystem (PS) layer Such APSoCs enjoy the merits of higher perfor mance, lower energy consumption, and favourable time-to-market and cost [5]. These highly-integrated circuits, which involve a set of homogeneous or heterogeneous processor cores, are very sus ceptible to transient faults that might even lead to total system failures. Experiments indicate that the TCLS approach applied to the dual-core ARM Cortex-A9 processor embedded in Xilinx Zynq-7000 APSoC is capable of mitigating around 98% of the bit-flips injected while keeping the timing performance overhead as low as 25%, when certain condi tions are satisfied, under fault-free conditions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have