Controlling the fire monitor through visual sensors is crucial for achieving automatic fire extinguishing. Since it takes a certain time for the water jet drop point (JDP) to respond the action of fire monitor, the controller with the image of the JDP will lead to jet oscillation and long adjustment time, which will reduce the effectiveness of fire extinguishing. To solve these problems, first of all, a kinematic model of the JDP with delay is constructed by scrutinizing the movement of particles in the water jet. This model quantifies the delay mechanism of the JDP and lessens its detrimental effects. Then, the visual predictive controller with visibility and structural restrictions is used to achieve the desired angles of the fire monitor in various degrees of freedom. Finally, the dynamic model of the fire monitor is derived to achieve accurate tracking control for the desired angles, and a controller with the linear extended state observer is proposed to account for unknown terms resulting from factors such as jet reaction forces. Comparative experiments are conducted on a real fire-extinguishing robot to verify the effectiveness of the proposed kinematic model and controller.