Abstract

The Transformer-based methods provide a good opportunity for modeling the global context of gigapixel whole slide image (WSI), however, there are still two main problems in applying Transformer to WSI-based survival analysis task. First, the training data for survival analysis is limited, which makes the model prone to overfitting. This problem is even worse for Transformer-based models which require large-scale data to train. Second, WSI is of extremely high resolution (up to 150,000 x 150,000 pixels) and is typically organized as a multi-resolution pyramid. Vanilla Transformer cannot model the hierarchical structure of WSI (such as patch cluster-level relationships), which makes it incapable of learning hierarchical WSI representation. To address these problems, in this paper, we propose a novel Sparse and Hierarchical Transformer (SH-Transformer) for survival analysis. Specifically, we introduce sparse self-attention to alleviate the overfitting problem, and propose a hierarchical Transformer structure to learn the hierarchical WSI representation. Experimental results based on three WSI datasets show that the proposed framework outperforms the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.