Probe-based confocal laser endomicroscopy (pCLE) has a role in characterising tissue intraoperatively to guide tumour resection during surgery. To capture good quality pCLE data which is important for diagnosis, the probe-tissue contact needs to be maintained within a working range of micrometre scale. This can be achieved through micro-surgical robotic manipulation which requires the automatic estimation of the probe-tissue distance. In this paper, we propose a novel deep regression framework composed of the Deep Regression Generative Adversarial Network (DR-GAN) and a Sequence Attention (SA) module. The aim of DR-GAN is to train the network using an enhanced image-based supervision approach. It extents the standard generator by using a well-defined function for image generation, instead of a learnable decoder. Also, DR-GAN uses a novel learnable neural perceptual loss which combines for the first time spatial and frequency domain features. This effectively suppresses the adverse effects of noise in the pCLE data. To incorporate temporal information, we've designed the SA module which is a cross-attention module, enhanced with Radial Basis Function based encoding (SA-RBF). Furthermore, to train the regression framework, we designed a multi-step training mechanism. During inference, the trained network is used to generate data representations which are fused along time in the SA-RBF module to boost the regression stability. Our proposed network advances SOTA networks by addressing the challenge of excessive noise in the pCLE data and enhancing regression stability. It outperforms SOTA networks applied on the pCLE Regression dataset (PRD) in terms of accuracy, data quality and stability.
Read full abstract