Robotic surgery is preferred over open or laparoscopic surgeries due to its intuitiveness and convenience. However, prolonged use of surgical robots can cause neck pain and joint fatigue in wrist and fingers. Also, input systems are bulky and difficult to maintain. To resolve these issues, we propose a novel input module based on real-time 3D hand tracking driven by RGB images and MediaPipe framework to control surgical robots such as patient side manipulator (PSM) and endoscopic camera manipulator (ECM) of da Vinci research kit. In this paper, we explore the mathematical basis of the proposed 3D hand tracking module and provide a proof-of-concept through user experience (UX) studies conducted in a virtual environment. End-to-end latencies for controlling PSM and ECM were 170 ± 10 ms and 270 ± 10 ms, respectively. Of fifteen novice participants recruited for the UX study, thirteen managed to reach a qualifiable level of proficiency after 50 min of practice and fatigue of hand and wrist were imperceivable. Therefore, we concluded that we have successfully developed a robust 3D hand tracking module for surgical robot control and in the future, it would hopefully reduce hardware cost and volume as well as resolve ergonomic problems. Furthermore, RGB image driven 3D hand tracking module developed in our study can be widely applicable to diverse fields such as extended reality (XR) development and remote robot control. In addition, we provide a new standard for evaluating novel input modalities of XR environments from a UX perspective.