Robotic systems are crucial in modern manufacturing. Complex assembly tasks require the collaboration of multiple robots. Their orchestration is challenging due to tight tolerances and precision requirements. In this work, we set up two Franka Panda robots performing a peg-in-hole insertion task of 1 mm clearance. We structure the control system hierarchically, planning the robots’ feedback-based trajectories with a central policy trained with reinforcement learning. These trajectories are executed by a low-level impedance controller on each robot. To enhance training convergence, we use reverse curriculum learning, novel for such a two-armed control task, iteratively structured with a minimum requirements and fine-tuning phase. We incorporate domain randomization, varying initial joint configurations of the task for generalization of the applicability. After training, we test the system in a simulation, discovering the impact of curriculum parameters on the emerging process time and its variance. Finally, we transfer the trained model to the real-world, resulting in a small decrease in task duration. Comparing our approach to classical path planning and control shows a decrease in process time, but higher robustness towards calibration errors.