The growing emphasis on ecological preservation and natural resource conservation has significantly advanced resource recycling, facilitating the realization of a sustainable green economy. Essential to resource recycling is the pivotal stage of disassembly, wherein the efficacy of disassembly tools plays a critical role. This work investigates the impact of disassembly tools on disassembly duration and formulates a mathematical model aimed at minimizing workstation cycle time. To solve this model, we employ an optimized advantage actor-critic algorithm within reinforcement learning. Furthermore, it utilizes the CPLEX solver to validate the model’s accuracy. The experimental results obtained from CPLEX not only confirm the algorithm’s viability but also enable a comparative analysis against both the original advantage actor-critic algorithm and the actor-critic algorithm. This comparative work verifies the superiority of the proposed algorithm.