Background and purposeWhile the inclusion of spatial dose information in deep learning (DL)-based normal-tissue complication probability (NTCP) models has been the focus of recent research studies, external validation is still lacking. This study aimed to externally validate a DL-based NTCP model for mandibular osteoradionecrosis (ORN) trained on 3D radiation dose distribution maps and clinical variables. Methods and materialsA 3D DenseNet-40 convolutional neural network (3D-mDN40) was trained on clinical and radiation dose distribution maps on a retrospective class-balanced matched cohort of 184 subjects. A second model (3D-DN40) was trained on dose maps only and both DL models were compared to a logistic regression (LR) model trained on DVH metrics and clinical variables. All models were externally validated by means of their discriminative ability and calibration on an independent dataset of 82 subjects. ResultsNo significant difference in performance was observed between models. In internal validation, these exhibited similar Brier scores around 0.2, Log Loss values of 0.6–0.7 and ROC AUC values around 0.7 (internal) and 0.6 (external). Differences in clinical variable distributions and their effect sizes were observed between internal and external cohorts, such as smoking status (0.6 vs. 0.1) and chemotherapy (0.1 vs. −0.5), respectively. ConclusionTo our knowledge, this is the first study to externally validate a multimodality DL-based ORN NTCP model. Utilising mandible dose distribution maps, these models show promise for enhancing spatial risk assessment and guiding dental and oncological decision-making, though further research is essential to address overfitting and domain shift for reliable clinical use.