Current deep learning methods in histopathology are limited by the small amount of available data and time consumption in labeling the data. Colorectal cancer (CRC) tumor budding quantification performed using H&E-stained slides is crucial for cancer staging and prognosis but is subject to labor-intensive annotation and human bias. Thus, acquiring a large-scale, fully annotated dataset for training a tumor budding (TB) segmentation/detection system is difficult. Here, we present a DatasetGAN-based approach that can generate essentially an unlimited number of images with TB masks from a moderate number of unlabeled images and a few annotated images. The images generated by our model closely resemble the real colon tissue on H&E-stained slides. We test the performance of this model by training a downstream segmentation model, UNet++, on the generated images and masks. Our results show that the trained UNet++ model can achieve reasonable TB segmentation performance, especially at the instance level. This study demonstrates the potential of developing an annotation-efficient segmentation model for automatic TB detection and quantification.
Read full abstract