Background and purposeAutosegmentation techniques are emerging as time-saving means for radiation therapy (RT) contouring, but the understanding of their performance on different datasets is limited. The aim of this study was to determine agreement between rectal volumes by an existing autosegmentation algorithm and manually-delineated rectal volumes in prostate cancer RT. We also investigated contour quality by different-sized training datasets and consistently-curated volumes for retrained versions of this same algorithm.Materials and methodsSingle-institutional data from 624 prostate cancer patients treated to 50–70 Gy were used. Manually-delineated clinical rectal volumes (clinical) and consistently-curated volumes recontoured to one anatomical guideline (reference) were compared to autocontoured volumes by a commercial autosegmentation tool based on deep-learning (v1; n = 891, multiple-institutional data) and retrained versions using subsets of the curated volumes (v32/64/128/256; n = 32/64/128/256). Evaluations included dose-volume histogram metrics, Dice similarity coefficients, and Hausdorff distances; differences between groups were quantified using parametric or non-parametric hypothesis testing.ResultsVolumes by v1-256 (76–78 cm3) were larger than reference (75 cm3) and clinical (76 cm3). Mean doses by v1-256 (24.2–25.2 Gy) were closer to reference (24.2 Gy) than to clinical (23.8 Gy). Maximum doses were similar for all volumes (65.7–66.0 Gy). Dice for v1-256 and reference (0.87–0.89) were higher than for v1-256 and clinical (0.86–0.87) with corresponding Hausdorff comparisons including reference smaller than comparisons including clinical (5–6 mm vs. 7–8 mm).ConclusionUsing small single-institutional RT datasets with consistently-defined rectal volumes when training autosegmentation algorithms created contours of similar quality as the same algorithm trained on large multi-institutional datasets.