Cloud manufacturing networks operate in stochastic environments, making them susceptible to various disruptions. Designing a resilient cloud network that can withstand and recover from disruptions while maintaining an efficient service composition is crucial. This paper emphasizes the importance of developing resilient cloud manufacturing networks, particularly in the context of unprecedented challenges. It explores the complexities of service composition using reinforcement learning algorithms, specifically Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). SAC has emerged as a standout performer, demonstrating superior capabilities in enhancing network resilience. The model’s verification is conducted by investigating and comparing the convergence of the results. This research not only examines these reinforcement learning methodologies but also broadens its scope to provide ongoing development and evaluation of a robust network. The model is applied to a COVID-19 case study to validate the results. Beyond theoretical frameworks, this research highlights the need for adaptive and resilient manufacturing systems, offering insights into real-world challenges. The results emphasize the significance of using RL algorithms, particularly when customer satisfaction is the top priority. Moreover, when logistic costs and costs of changing servers are lower than customer dissatisfaction, the use of RL algorithms becomes essential for addressing disruption scenarios. The findings highlight the critical role of collaboration in intertwined supply networks in helping a disrupted network withstand and recover from a disruption.