Abstract Introduction. Clinical datasets for training deep learning (DL) models often exhibit high levels of heterogeneity due to differences such as patient characteristics, new medical techniques, and physician preferences. In recent years, hydrogel spacers have been used in some prostate cancer patients receiving radiotherapy to separate the prostate and the rectum to better spare the rectum while achieving adequate dose coverage on the prostate. However, this substantially affects the computed tomography image appearance, which downstream reduced the contouring accuracy of auto-segmentation algorithms. This leads to highly heterogeneous dataset. Methods. To address this issue, we propose to identify underlying clusters within the dataset and use the cluster labels for segmentation. We collected a clinical dataset of 909 patients, including those with two types of hydrogel spacers and those without. First, we trained a DL model to locate the prostate and limit our field of view to the local area surrounding the prostate and rectum. We then used Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction and employed k-means clustering to assign each patient to a cluster. To leverage this clustered data, we propose a text-guided segmentation model, contrastive language and image pre-training (CLIP)-UNet, which encodes the cluster information using a text encoder and combines the encoded text information with image features for segmentation. Results. The UMAP results indicated up to three clusters within the dataset. CLIP-UNet with cluster information achieved a Dice score of 86.2% compared to 84.4% from the baseline UNet. Additionally, CLIP-UNet outperforms other state-of-the-art models with or without cluster information. Conclusion. Automatic clustering assisted by DL can reveal hidden data clusters in clinical datasets, and CLIP-UNet effectively utilizes clustered labels and achieves higher performance.
Read full abstract