In recent years, the integration of robots in minimally invasive surgery has gained significant traction in clinical practice. However, conventional contact-based human-computer interaction poses the risk of bacterial infection, significantly limiting the role of robots in surgery. To address this limitation, we propose an innovative interaction method rooted in gestures and visual tags, allowing surgeons to control and fine-tune surgical robots without physical contact with the environment. By encoding the six gestures collected using LeapMotion, we can effectively control the surgical robot in a non-contact manner. Moreover, utilizing Aruco technology, we have accurately identified the 3D spatial position of the visual label, and developed 12 fine-tuning operations to refine surgical instruments. To evaluate the applicability of our proposed system in surgery, we designed a relevant experimental setup. In the experiment, we achieved enough precision. These results demonstrate that our system meets the clinical standard, providing doctors with a non-contact and flexible means of interacting with robots during surgery.