In this work, an automated member design method is proposed based on deep reinforcement learning (DRL). Using steel–concrete composite beam design as a case study, the design procedure is conceptualized as a sequential optimization process and modeled as a Markov Decision Process (MDP). A design agent equipped with two deep neural networks, is trained using the proximal policy optimization (PPO) method to complete the complex design task of steel–concrete composite beams, adhering to the Chinese standard for design of steel structures GB50017-2017 provisions. The structural provisions are integrated into a simulated design environment, which can respond to the agent’s actions. A universal reward shaping function is introduced to guide the agent towards success and minimize material cost. The design quality and generation performance of trained PPO agent are compared with those of the Differential Evolution (DE) in 100 random training cases, 100 random testing cases, and 95 real-world engineering cases, demonstrating significant promise and reduced time consumption for automated steel–concrete composite beam design. An approach combining PPO with DE is proposed to enhance the robustness and reliability of the original agent. Finally, a brief discussion is conducted on the essence and prospects of the proposed method.
Read full abstract