Abstract
INTRODUCTION: Minimally-invasive robotic surgery benefits the surgeon with increased dexterity and precision, more comfortable seating, and depth perception. Indeed, the stereo-endoscopic camera of the daVinci robot provides the surgeon with a high-resolution 3D view of the surgical scene inside the patient body. To leverage this depth information using advanced computational tools (such as augmented reality or collision detection), we need a fast and accurate stereo matching algorithm, which computes the disparity (pixel shift) map between left and right images. To improve this trade-off between speed and accuracy, we propose an efficient multi-scale approach that overcomes standard multi-scale limitations due to interpolation artifacts when upsampling intermediate disparity results from coarser to finer scale. METHODS: Standard stereo matching algorithms perform an exhaustive search of the most similar patch between the reference and target images (along the same horizontal line when images are rectified). This requires a wide search range in the target image to ensure finding the corresponding pixel in the reference image (Figure 1). To optimize this search, we propose a multi-scale approach that uses the disparity map resulting from previous iteration at lower resolution. Instead of directly using the pixel position in the reference image to place the search region in the target image, we shift it by the corresponding disparity value from previous iteration and reduce the width of the search region as it is expected to be closer to the optimal solution. We also add two additional search regions shifted by disparity values at left and right adjacent pixel positions (Figure 2) to avoid errors typically related to interpolation artifacts when resizing disparity map. To avoid important overlaps between different search regions, we only add them where the disparity map has strong gradients. MATERIAL: We used stereo images from the Middlebury dataset (http://vision.middlebury.edu/stereo/data/) and stereo-endoscopic images captured at full HD 1080i resolution using a daVinci S/Si HD Surgical System. Experiments were performed with a GPU implementation on a workstation with 128GB RAM, an Intel Xeon Processor E5-2690, and an NVIDIA Tesla C2075. RESULTS: We compared the accuracy and speed between standard and proposed methods using ten images from the Middlebury dataset that has the advantage to provide ground truth disparity maps. We used the sum of square difference (SSD) as a similarity metric between patches of size 3x3 in left and right rectified images, resized to half their original size (665x555). For the standard method, we set the search range offset and width to respectively -25 and 64 pixels. For the proposed method, we initialize the disparity to 0 followed by five iterations with a search range width of 16. Results in Table 1 show that we managed to improve the average accuracy by 27% without affecting the average computation time of 120ms. CONCLUSION: We proposed an efficient multi-scale stereo matching algorithm that significantly improves accuracy without compromising speed. In future work, we will investigate the benefits of a similar approach using temporal consistency between successive frames and its use in more advanced computational tools for image-guided surgery.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have