Abstract

In this paper, we propose a reliability optimization technique for low earth orbit (LEO) satellite systems that operate in highly varying temperatures. In the proposed technique, we harden the target satellite on-board computer considering two common reliability threats, soft-error and hard-error, at the same time. Based on an existing fault-tolerant scheduling, we show how the reliability and lifetime of the system can be quantified with respect to a given error handling policy. Then, using these models, we perform extensive reliability analyses and optimizations of the hardening degree for different functional safety requirements. We implemented the proposed technique in a widely adopted real-time operating system (RTOS), Real-Time Executive for Multiprocessor Systems (RTEMS), working on a real satellite on-board computer system, GR712RC. The proposed technique enables efficient explorations of the trade-off between the soft-error reliability (failure rate), hard-error reliability (expected lifetime), and other resource utilization indicators, like power consumption or CPU utilization. In particular, the proposed technique could be successfully utilized to co-optimize the lifetime and fault-tolerance concerning the unique ambient temperature profile of LEO satellites.

Highlights

  • It is increasingly important to satisfy the reliability requirements in the design of real-time embedded systems

  • We implement a known redundancy-based scheduling technique on top of an existing real-time operating system (RTOS) that is commonly used for satellite systems and propose to judiciously determine the scheduling redundancy for low earth orbit (LEO) satellite systems in order to enhance the soft-error resilience without deteriorating the hard-error resilience

  • The rest of this paper is organized as follows: we review the previous research on the reliabilityaware system design

Read more

Summary

INTRODUCTION

It is increasingly important to satisfy the reliability requirements in the design of real-time embedded systems. Recent small satellites tend to equip high-end processors to deal with artificial intelligence (AI) and machine learning (ML) workloads, as exemplified SmartSat from Lockheedmartin [20] using Nvidia Jetson processor When such normal commercial-off-the-shelf (COTS) embedded boards are used in the design of satellite systems in favor of high performance, space-specific hardware hardening techniques may not be available. In this paper, we consider both the hardware- and software-based solutions in real-time satellite embedded systems operating in a largely variable temperature environment. We implement a known redundancy-based scheduling technique on top of an existing real-time operating system (RTOS) that is commonly used for satellite systems and propose to judiciously determine the scheduling redundancy for LEO satellite systems in order to enhance the soft-error resilience without deteriorating the hard-error resilience. The overall flow of the proposed reliability evaluation/optimization procedure is depicted in Figure 2 along with the section organization

RELATED WORK
WORKLOAD MODEL
TEMPERATURE MODEL
SOFTWARE ERROR DETECTION AND CORRECTION
FAULT-TOLERANT SCHEDULING
SOFT-ERROR RELIABILITY QUANTIFICATION
HARD-ERROR RELIABILITY QUANTIFICATION
CONCLUSION
Findings
Accessed
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call