This paper proposes a proximal policy optimization (PPO) algorithm for coupling matrix synthesis of microwave filters. With the improvement of filter design requirement, the limitations of traditional methods such as limited applicability are becoming more and more obvious. In order to improve the filter synthesis efficiency, this paper constructs a reinforcement learning algorithm based on Actor-Critic network architecture, and designs a unique filter coupling matrix synthesis reward function and action function, which can solve combinatorial optimization problems stably.