Abstract

Automatic surgical instrument segmentation is a necessary step for the steady operation of surgical robots, and the segmentation accuracy directly affects the surgical effect. Nevertheless, accurate surgical instrument segmentation from endoscopic images remains a challenging task due to the complex environment and instrument motion during surgery. Based on the encoder–decoder structure, a transformer-based multiscale fusion network, named TMF-Net, is proposed to address the difficulties in the area of surgical instrument segmentation. To realize the effective feature representation based on the pretrained ResNet34 and transformer and to strengthen both advantages of different encoder units, a dual-encoder unit is proposed to simultaneously learn the semantic relationship between adjacent pixels and distant pixels and comprehensively capture the global context information. Meanwhile, to retain more contextual information, a trapezoid atrous spatial pyramid pooling (trapezoid ASPP) block is proposed for feature enhancement of local features with different receptive fields to enrich feature information. Furthermore, in addition to multiscale surgical instruments in endoscopic images, a multiscale attention fusion (MAF) block is proposed to fuse multiscale feature maps to make the segmentation network direct more attention to the efficient channels so that it can improve the segmentation accuracy. Two typical datasets are used for performance analysis and verification, including Kvasir-Instrument and Endovis2017. Experimental results indicate that the proposed TMF-Net could effectively improve segmentation accuracy on surgical instruments, and it could also yield a competitive segmentation result in comparison with advanced detection methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.