With the rapid growth of the new energy vehicle market, efficient management of the closed-loop supply chain of power batteries has become an important issue. Effective closed-loop supply chain management is very critical, which is related to the efficient utilization of resources, environmental responsibility, and the realization of economic benefits. In this paper, the Markov Decision Process (MDP) is used to model the decision-making and coordination mechanism of the closed-loop supply chain of power batteries in order to cope with the challenges in the management process, such as cost, quality, and technological progress. By constructing the MDP model for different supply chain participants, this paper investigates the optimization strategy of the supply chain and applies two solution methods: dynamic programming and reinforcement learning. The case study results show that the model can effectively identify optimized supply chain decisions, improve the overall efficiency of the supply chain, and coordinate the interests among parties. The contribution of this study is to provide a new modeling framework for power battery recycling and to demonstrate the practicality and effectiveness of the method with empirical data. This study demonstrates that the Markov decision-making process can be a powerful tool for closed-loop supply chain management, promotes a deeper understanding of the complex decision-making environment of the supply chain, and provides a new solution path for decision-making and coordination in the supply chain.