Abstract

As the backbone of Industry 4.0, industrial cyber-physical systems (ICPSs) that are geographically dispersed, federated, cooperative, and security-critical systems become the center of interest from both industry and academia. In ICPS, there are huge amounts of devices, such as sensors and actuators, which are embedded and networked together to improve the performance of real-time monitoring and control. Reliability and temperature are two important concerns of these embedded and networked devices in ICPS due to their stringent requirement of reliable execution and long lifespan. In this article, we study the problem of maximizing soft-error reliability of CPU- and GPU-integrated embedded platforms deployed in ICPS under the temperature constraint. To speed up the estimation of soft-error rate (SER) and temperature, we train an artificial neural network (ANN) that is able to quickly and accurately derive the system’s SER and temperature. To solve the temperature-constrained reliability optimization problem, we propose a feedback control-based task scheduling scheme that adaptively determines the number of tasks admitted in the system and the number of replicas for the admitted tasks. We perform a series of simulation experiments to verify the efficacy of our scheme. The experimental results demonstrate that: 1) the estimated SER and temperature derived by our ANN-based method are very close to the ground-truth data and 2) our proposed feedback control-based task scheduling method can improve system reliability by up to 184.2% with a lower peak temperature when compared with one baseline and two state-of-the-art methods. Note to Practitioners—This article is motivated by the safety-critical industrial cyber-physical system (ICPS) applications necessitating reliable execution and long lifespan, which could be realized by increasing reliability and controlling operating temperature. Our goal is to improve the system reliability of CPU- and GPU-integrated multiprocessor systems-on-chip (MPSoCs) deployed in ICPS under the temperature constraint. Most of the existing papers target either reliability or temperature. A few recent papers have focused on reliability and temperature optimization simultaneously. However, they are not designed for ICPS and do not consider the widely accepted CPU- and GPU-integrated MPSoC platforms. This article proposes a machine learning-based approach that trains an artificial neural network (ANN) to facilitate the online estimation of system SER and temperature. Compared to the offline estimation using simulation tools, the online approach is more applicable to real-time ICPS applications. This article also designs a feedback control-based approach for improving system reliability and reducing peak temperature of the CPU- and GPU-integrated MPSoCs by determining the number of tasks to be admitted and the number of replicas for tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call