As technology advances, more research dedicated to medical interactive systems emphasizes the integration of touchless and multimodal interaction (MMI). Particularly in surgical and interventional settings, this approach is advantageous because it maintains sterility and promotes a natural interaction. Past reviews have focused on investigating MMI in terms of technology and interaction with robots. However, none has put particular emphasis on analyzing these kind of interactions for surgical and interventional scenarios. Two databases were included in the query to search for relevant publications within the past 10 years. After identification, two screening steps followed which included eligibility criteria. A forward/backward search was added to identify more relevant publications. The analysis incorporated the clustering of references in terms of addressed medical field, input and output modalities, and challenges regarding the development and evaluation. A sample of 31 references was obtained (16 journal articles, 15 conference papers). MMI was predominantly developed for laparoscopy and radiology and interaction with image viewers. The majority implemented two input modalities, with voice-hand interaction being the most common combination-voice for discrete and hand for continuous navigation tasks. The application of gaze, body, and facial control is minimal, primarily because of ergonomic concerns. Feedback was included in 81% publications, of which visual cues were most often applied. This work systematically reviews MMI for surgical and interventional scenarios over the past decade. In future research endeavors, we propose an enhanced focus on conducting in-depth analyses of the considered use cases and the application of standardized evaluation methods. Moreover, insights from various sectors, including but not limited to the gaming sector, should be exploited.