AbstractA mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model‐based and model‐free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model‐referenced information and conforms to the distribution of the real‐world environment, which balances the influence of model‐based control law and data‐driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data‐driven methods, concurrently also enhancing the tracking performance in comparison to model‐based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero‐sum game, and the actor‐critic‐disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line‐of‐sight guidance is presented to demonstrate the effectiveness of the proposed method.