Abstract
Pancreatic cancer is one of the most malignant cancers with high mortality. The rapid on-site evaluation (ROSE) technique can significantly accelerate the diagnostic workflow of pancreatic cancer by immediately analyzing the fast-stained cytopathological images with on-site pathologists. However, the broader expansion of ROSE diagnosis has been hindered by the shortage of experienced pathologists. Deep learning has great potential for the automatic classification of ROSE images in diagnosis. But it is challenging to model the complicated local and global image features. The traditional convolutional neural network (CNN) structure can effectively extract spatial features, while it tends to ignore global features when the prominent local features are misleading. In contrast, the Transformer structure has excellent advantages in capturing global features and long-range relations, while it has limited ability in utilizing local features. We propose a multi-stage hybrid Transformer (MSHT) to combine the strengths of both, where a CNN backbone robustly extracts multi-stage local features at different scales as the attention guidance, and a Transformer encodes them for sophisticated global modeling. Going beyond the strength of each single method, the MSHT can simultaneously enhance the Transformer global modeling ability with the local guidance from CNN features. To evaluate the method in this unexplored field, a dataset of 4240 ROSE images is collected where MSHT achieves 95.68% in classification accuracy with more accurate attention regions. The distinctively superior results compared to the state-of-the-art models make MSHT extremely promising for cytopathological image analysis. The codes and records are available at: https://github.com/sagizty/ Multi-Stage-Hybrid-Transformer.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have