An integral stage in typical digital pathology workflows involves deriving specific features from tiles extracted from a tessellated whole-slide image. Notably, various computer vision neural network architectures, particularly the ImageNet pretrained, have been extensively used in this domain. This study critically analyzes multiple strategies for encoding tiles to understand the extent of transfer learning and identify the most effective approach. The study categorizes neural network performance into 3 weight initialization methods: random, ImageNet-based, and self-supervised learning. Additionally, we propose a framework based on task-specific self-supervised learning, which introduces a shallow feature extraction method, employing a spatial-channel attention block to glean distinctive features optimized for histopathology intricacies. Across 2 different downstream classification tasks (patch classification and weakly supervised whole-slide image classification) with diverse classification data sets, including colorectal cancer histology, Patch Camelyon, prostate cancer detection, The Cancer Genome Atlas, and CIFAR-10, our task-specific self-supervised encoding approach consistently outperforms other convolutional neural network–based encoders. The better performances highlight the potential of task-specific attention-based self-supervised training in tailoring feature extraction for histopathology, indicating a shift from using pretrained models originating outside the histopathology domain. Our study supports the idea that task-specific self-supervised learning allows domain-specific feature extraction, encouraging a more focused analysis.
Read full abstract