Tracking of Retinal Microsurgery Tools Using Late Fusion of Responses from Convolutional Neural Network over Pyramidally Decomposed Frames

Kaustuv Mishra,Rachana Sathish,Debdoot Sheet

doi:10.1007/978-3-319-68124-5_31

Abstract

Computer vision and robotic assistance are increasingly being used to improve the quality of surgical interventions. Tool tracking becomes critical in interventions viz. endoscopy, laparoscopy and retinal microsurgery (RM) where unlike open surgery the surgeons do not have direct visual and physical access to the surgical site. RM is performed using miniaturized tools and requires careful observation through a surgical microscope by the surgeon. Tracking of surgical tools primarily provides robotic assistance during surgery and also serves as a means to assess the quality of surgery, which is extremely useful during surgical training. In this paper we propose a deep learning based visual tracking of surgical tool using late fusion of responses from convolutional neural network (CNN) which comprises of 3 steps: (i) training of CNN for localizing the tool tip on a frame (ii) coarsely estimating the tool tip region using the trained CNN and (iii) a finer search around the estimated region to accurately localize the tool tip. Scale invariant tracking of tool is ensured by incorporating multi-scale late fusion where the CNN responses are obtained at each level of the Gaussian scale decomposition pyramid. Performance of the proposed method is experimentally validated on the publicly available retinal microscopy instrument tracking (RMIT) dataset (https://sites.google.com/site/sznitr/code-and-datasets). Our method tracks tools with a maximum accuracy of \(99.13\%\) which substantiates the efficacy of the proposed method in comparison to existing approaches.

Full Text