Multi-modal image analysis using deep learning (DL) lays the foundation for neoadjuvant treatment (NAT) response monitoring. However, existing methods prioritize extracting multi-modal features to enhance predictive performance, with limited consideration on real-world clinical applicability, particularly in longitudinal NAT scenarios with multi-modal data. Here, we propose the Multi-modal Response Prediction (MRP) system, designed to mimic real-world physician assessments of NAT responses in breast cancer. To enhance feasibility, MRP integrates cross-modal knowledge mining and temporal information embedding strategy to handle missing modalities and remain less affected by different NAT settings. We validated MRP through multi-center studies and multinational reader studies. MRP exhibited comparable robustness to breast radiologists, outperforming humans in predicting pathological complete response in the Pre-NAT phase (ΔAUROC 14% and 10% on in-house and external datasets, respectively). Furthermore, we assessed MRP’s clinical utility impact on treatment decision-making. MRP may have profound implications for enrolment into NAT trials and determining surgery extensiveness.
Read full abstract