This paper introduces a novel optimal robust formation method for quadcopter multiple unmanned aerial vehicle (multi-UAV) systems. Firstly, a reinforcement learning (RL) algorithm based on a unique gradient descent training approach is proposed to solve the Hamilton–Jacobi–Bellman (HJB) equation, which can effectively eliminate the requirement of the Persistent Excitation (PE) condition. Secondly, the robustness of the controlled system is emphasized, and an Uncertainty and Disturbance Estimator (UDE) observer is developed to suppress model uncertainty and external disturbances through filtering techniques. Furthermore, a switched sliding mode control technique according to the average dwell time (ADT) is employed to convert switching communication topology between UAVs dynamically, and the stability analysis of the corresponding closed-loop control systems is then performed by the use of Lyapunov analysis. Finally, the simulation examples are provided to verify the effectiveness of the designed control strategy.