Aiming at addressing the problem that the joints are easily destroyed by the impact torque during the process of space robot on-orbit capturing a non-cooperative spacecraft, a reinforcement learning control algorithm combined with a compliant mechanism is proposed to achieve buffer compliance control. The compliant mechanism can not only absorb the impact energy through the deformation of its internal spring, but also limit the impact torque to a safe range by combining with the compliance control strategy. First of all, the dynamic models of the space robot and the target spacecraft before capture are obtained by using the Lagrange approach and Newton-Euler method. After that, based on the law of conservation of momentum, the constraints of kinematics and velocity, the integrated dynamic model of the post-capture hybrid system is derived. Considering the unstable hybrid system, a buffer compliance control based on reinforcement learning is proposed for the stable control. The associative search network is employed to approximate unknown nonlinear functions, an adaptive critic network is utilized to construct reinforcement signal to tune the associative search network. The numerical simulation shows that the proposed control scheme can reduce the impact torque acting on joints by 76.6% at the maximum and 58.7% at the minimum in the capturing operation phase. And in the stable control phase, the impact torque acting on the joints were limited within the safety threshold, which can avoid overload and damage of the joint actuators.