<span>Cloud computing has revolutionized IT delivery by offering scalable on-demand internet services encompassing software, platforms, and infrastructure. However, cloud services face significant performance challenges due to their susceptibility to failures given their vast operational scale. Implementing fault tolerance in dynamic cloud services is a key challenge, with complex configurations and dependencies complicating deployment. This paper introduces an innovative approach that combines double deep Q-learning (DDQL) with a dynamic fault-tolerant real-time scheduling algorithm (DFTRTSA) to enhance fault tolerance in real-time systems. DDQL, an extension of deep Q-learning, optimizes the fault-tolerance decision-making process. The algorithm adjusts scheduling strategies dynamically based on system conditions and errors. The fusion of DDQL and DFTRTSA aims to create a resilient and adaptive fault-tolerant mechanism, ensuring uninterrupted operation while meeting real-time requirements. This adaptive approach efficiently manages resources, meets deadlines, and gracefully handles errors, as demonstrated through experiments. Our DDQL-DFTRTSA method outperforms conventional fault-tolerant mechanisms in defect tolerance, energy efficiency, downtime reduction, and system dependability. It proves to be an ideal solution for real-time systems in dynamic and unpredictable environments.</span>
Read full abstract