Background and purposeNormal tissue complication probability (NTCP) models are developed from large retrospective datasets where automatic contouring is often used to contour the organs at risk. This study proposes a methodology to estimate how discrepancies between two sets of contours are reflected on NTCP model performance. We apply this methodology to heart contours within a dataset of non-small cell lung cancer (NSCLC) patients. Materials and methodsOne of the contour sets is designated the ground truth and a dosimetric parameter derived from it is used to simulate outcomes via a predefined NTCP relationship. For each simulated outcome, the selected dosimetric parameters associated with each contour set are individually used to fit a toxicity model and their performance is compared. Our dataset comprised 605 stage IIA-IIIB NSCLC patients. Manual, deep learning, and atlas-based heart contours were available. ResultsHow contour differences were reflected in NTCP model performance depended on the slope of the predefined model, the dosimetric parameter utilized, and the size of the cohort. The impact of contour differences on NTCP model performance increased with steeper NTCP curves. In our dataset, parameters on the low range of the dose-volume histogram were more robust to contour differences. ConclusionsOur methodology can be used to estimate whether a given contouring model is fit for NTCP model development. For the heart in comparable datasets, average Dice should be at least as high as between our manual and deep learning contours for shallow NTCP relationships (88.5 ± 4.5 %) and higher for steep relationships.
Read full abstract