Pan–Tilt–Zoom (PTZ) cameras, with their significant features of free rotation and zoom, are widely used in areas such as border security, ecological conservation, emergency management, and the military. PTZ cameras can achieve automatic monitoring of a selected area through multiple servo operations, known as multi-point servo control. However, due to the deficiencies in the servo control Software Development Kit (SDK), hardware wear, and interference from complex external environments, the multi-point servo control process generates significant errors. This paper proposes a precise multi-point servo control framework based on Deep Reinforcement Learning (DRL) to address this issue. The complexity of real-world environments necessitates a reward function design that fully considers various factors, for which we propose the directional gravity reward function. Due to the instability during the training process, prolonged trial-and-error interactions between the agent and the equipment can cause irreversible damage to the devices. This framework employs a phased training approach, where the agent learns sequentially from offline, off-policy, and on-policy data, reducing direct interaction with the equipment while enhancing the agent’s overall performance. Additionally, real-time and accurate device status can be obtained by performing feature matching on images from adjacent time frames, which is crucial for the system’s operation. Evaluation results indicate that our proposed precise multi-point servo control framework significantly outperforms other methods in 4-point servo control tasks in both virtual and real-world scenarios. Additionally, the operational process fully considers safety issues.