When synchronizing two continuous typical chaotic systems, the state variables of the response system are usually bounded and satisfy the Lipschitz condition. This permits a universal synchronization method. In the synchronization of two high-dimensional discrete chaotic systems such as the coupled map lattice systems, the state variables of the response system often diverge, which makes it difficult to seek a universal synchronization method. To overcome this difficulty, bounded and hard constraints, which correspond to the concept of safety level III in reinforcement learning terms, on the state variables must be imposed, when the discrete response system is perturbed. We propose a universal method for synchronizing two discrete systems based on safe reinforcement learning (RL). In this method, the RL agent’s policy is used to reach the goal of synchronization and a safety layer added directly on top of the policy is used to satisfy safety level III. The safety layer consists of a one-step predictor for the perturbed response system and an action correction formulation. The one-step predictor, based on a next generation reservoir computing, is used to identify whether the next state of the perturbed system is within the chaotic domain, and if not, the action correction formula is activated to modify the corresponding perturbing force component to zero. In this way, the state of the perturbed system will remain in the bounded chaotic domain. We demonstrate that the proposed method succeeds in synchronization without divergence through a numerical example with two coupled map lattice systems, and the average synchronization time is 12.66 iteration steps over 1000 different initial conditions. The method works even when parameter mismatches and disturbances exist in the system. We compare the performance in both cases with and without the safety layer and find that successful synchronization is only possible in the former case, emphasizing the significance of the safety layer. The performance of the algorithm with optimal hyper-parameters obtained by Bayesian optimization is robust and stable.