Resilient Sinkhorn-Based Optimal Transport Late Fusion Framework for Breast Cancer Diagnosis.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This research aims to develop and evaluate a clinically deployable multimodal deep learning framework for breast cancer diagnosis that maintains robustness, even when clinical data are asynchronous, unpaired, or incomplete, effectively addressing real-world challenges related to data heterogeneity and fragmented clinical workflows. In this retrospective study, a multimodal deep learning architecture was developed that integrates histopathological images with structured clinical risk factors. Custom models were developed and independently trained for each modality, and late fusion was achieved via a dynamically reweighted Sinkhorn-based fusion layer. Model performance was evaluated using precision-recall Area Under Curve (PR-AUC), recall, F1 score, and Brier score under complete and partial modality availability scenarios. Robustness and clinical utility were further assessed through statistical significance testing and decision curve analysis (DCA). Additionally, we employed a Sinkhorn cost matrix to enhance interpretability. The proposed Sinkhorn fusion model outperformed all baseline methods, achieving the highest recall (0.96), PR-AUC (0.775), F1 score (0.828), and the best calibration (Brier score ≈ 0.19). Notably, it maintained perfect recall (1.00) under a 50% simulated modality dropout, despite a significant drop in PR-AUC (20% vs 0%: t = -20.35, P < .0001; 50% vs 0%: t = 88.60, P < .0001), portraying a strong overall robustness to information missingness. Under internally controlled conditions, DCA demonstrated superior clinical utility across thresholds of 0.2 to 0.7. The model's ability to accommodate unpaired and incomplete clinical inputs while maintaining both calibration and sensitivity makes it particularly well-suited for deployment in asynchronous and resource-constrained settings. Its consistent performance under clinical uncertainty and minimal preprocessing requirements represents a significant advancement toward equitable, reliable, and scalable AI-assisted breast cancer screening. To our knowledge, this is the first paper to model breast cancer late fusion as an optimal transport problem.

Similar Papers
  • Research Article
  • Cite Count Icon 37
  • 10.1007/s13755-021-00151-x
Computer-aided diagnosis of hepatocellular carcinoma fusing imaging and structured health data.
  • May 4, 2021
  • Health Information Science and Systems
  • Alan Baronio Menegotto + 2 more

Hepatocellular carcinoma is the prevalent primary liver cancer, a silent disease that killed 782,000 worldwide in 2018. Multimodal deep learning is the application of deep learning techniques, fusing more than one data modality as the model's input. A computer-aided diagnosis system for hepatocellular carcinoma developed with multimodal deep learning approaches could use multiple data modalities as recommended by clinical guidelines, and enhance the robustness and the value of the second-opinion given to physicians. This article describes the process of creation and evaluation of an algorithm for computer-aided diagnosis of hepatocellular carcinoma developed with multimodal deep learning techniques fusing preprocessed computed-tomography images with structured data from patient Electronic Health Records. The classification performance achieved by the proposed algorithm in the test dataset was: accuracy = 86.9%, precision = 89.6%, recall = 86.9% and F-Score = 86.7%. These classification performance metrics are closer to the state-of-the-art in this area and were achieved with data modalities which are cheaper than traditional Magnetic Resonance Imaging approaches, enabling the use of the proposed algorithm by low and mid-sized healthcare institutions. The classification performance achieved with the multimodal deep learning algorithm is higher than human specialists diagnostic performance using only CT for diagnosis. Even though the results are promising, the multimodal deep learning architecture used for hepatocellular carcinoma prediction needs more training and test processes using different datasets before the use of the proposed algorithm by physicians in real healthcare routines. The additional training aims to confirm the classification performance achieved and enhance the model's robustness.

  • Research Article
  • Cite Count Icon 2
  • 10.1186/s13058-025-02129-z
Multimodal deep learning model for prediction of breast cancer recurrence risk and correlation with oncotype DX
  • Jan 1, 2025
  • Breast Cancer Research : BCR
  • Ruixin Zhang + 7 more

BackgroundProper stratification of recurrence risk in breast cancer is crucial for guiding treatment decisions. This study aims to predict the recurrence risk of breast cancer patients using a multimodal deep learning model that integrates multiple sequence MRI imaging features with clinicopathologic characteristics.MethodsIn this retrospective study, we enrolled 574 patients with non-metastatic invasive breast cancer from two Chinese institutions between September 2012 and July 2019. We developed a multimodal deep learning (MDL) model by constructing a multi-instance learning framework based on convolutional neural networks. We integrated imaging features from T2WI, DWI, and DCE-MRI sequences with clinicopathologic features for breast cancer recurrence risk stratification. Subsequently, the performance of the MDL model was evaluated using receiver operating characteristic (ROC) curves, the Hosmer–Lemeshow test, calibration curves, and decision curve analysis (DCA). Survival analysis was conducted with Kaplan–Meier survival curves to stratify breast cancer patients into high and low-recurrence risk groups. Time-dependent ROC curves were used to assess 3-year, 5-year, and 7-year recurrence-free survival (RFS) for breast cancer patients. Additionally, we performed differential and enrichment analyses on Oncotype DX genes. We correlated these genes with clinicopathologic features and deep-learning radiographic features using univariate Cox regression and Pearson correlation analysis.ResultsThe MDL model demonstrated good performance in predicting breast cancer recurrence risk and accurately differentiated between high- and low-recurrence risk groups, with an AUC as high as 0.915 (95% CI 0.8448–0.9856). The C-index of prediction models was 0.803 in the testing cohort. The AUCs for 5-year and 7-year RFS were 0.936 (95% CI 0.876–0.997) and 0.956 (95% CI 0.902–1.000) in the validation cohort. In the testing cohort, these AUCs were 0.836 (95% CI 0.763–0.909) and 0.783 (95% CI 0.676–0.891). This study found a significant correlation between Oncotype DX gene expression, clinicopathologic features, and deep-learning radiographic features (p < 0.05).ConclusionsThis study validated the robust predictive accuracy of the MDL model in identifying high- and low-risk groups for recurrence. The correlations identified between Oncotype DX genes, clinicopathologic features, and deep-learning radiographic features offer novel insights for future biomarker research in breast cancer.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13058-025-02129-z.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.jcp.2023.112726
Adaptive mesh methods on compact manifolds via Optimal Transport and Optimal Information Transport
  • Dec 27, 2023
  • Journal of Computational Physics
  • Axel G.R Turnquist

Adaptive mesh methods on compact manifolds via Optimal Transport and Optimal Information Transport

  • Research Article
  • Cite Count Icon 12
  • 10.1097/corr.0000000000003030
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?
  • Mar 22, 2024
  • Clinical orthopaedics and related research
  • Yu-Ting Pan + 9 more

Level III, prognostic study.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s10278-025-01566-8
Multimodal Deep Learning Based on Ultrasound Images and Clinical Data for Better Ovarian Cancer Diagnosis.
  • Jun 24, 2025
  • Journal of imaging informatics in medicine
  • Chang Su + 8 more

This study aimed to develop and validate a multimodal deep learning model that leverages 2D grayscale ultrasound (US) images alongside readily available clinical data to improve diagnostic performance for ovarian cancer (OC). A retrospective analysis was conducted involving 1899 patients who underwent preoperative US examinations and subsequent surgeries for adnexal masses between 2019 and 2024. A multimodal deep learning model was constructed for OC diagnosis and extracting US morphological features from the images. The model's performance was evaluated using metrics such as receiver operating characteristic (ROC) curves, accuracy, and F1 score. The multimodal deep learning model exhibited superior performance compared to the image-only model, achieving areas under the curves (AUCs) of 0.9393 (95% CI 0.9139-0.9648) and 0.9317 (95% CI 0.9062-0.9573) in the internal and external test sets, respectively. The model significantly improved the AUCs for OC diagnosis by radiologists and enhanced inter-reader agreement. Regarding US morphological feature extraction, the model demonstrated robust performance, attaining accuracies of 86.34% and 85.62% in the internal and external test sets, respectively. Multimodal deep learning has the potential to enhance the diagnostic accuracy and consistency of radiologists in identifying OC. The model's effective feature extraction from ultrasound images underscores the capability of multimodal deep learning to automate the generation of structured ultrasound reports.

  • Research Article
  • 10.1109/tnnls.2024.3462504
Anchor Space Optimal Transport as a Fast Solution to Multiple Optimal Transport Problems.
  • Jan 1, 2024
  • IEEE transactions on neural networks and learning systems
  • Jianming Huang + 3 more

In machine learning, optimal transport (OT) theory is extensively utilized to compare probability distributions across various applications, such as graph data represented by node distributions and image data represented by pixel distributions. In practical scenarios, it is often necessary to solve multiple OT problems. Traditionally, these problems are treated independently, with each OT problem being solved sequentially. However, the computational complexity required to solve a single OT problem is already substantial, making the resolution of multiple OT problems even more challenging. Although many applications of fast solutions to OT are based on the premise of a single OT problem with arbitrary distributions, few efforts handle such multiple OT problems with multiple distributions. Therefore, we propose the anchor space OT (ASOT) problem: an approximate OT problem designed for multiple OT problems. This proposal stems from our finding that in many tasks the mass transport tends to be concentrated in a reduced space from the original feature space. By restricting the mass transport to a learned anchor point space, ASOT avoids pairwise instantiations of cost matrices for multiple OT problems and simplifies the problems by canceling insignificant transports. This simplification greatly reduces its computational costs. We then prove the upper bounds of its 1-Wasserstein distance error between the proposed ASOT and the original OT problem under different conditions. Building upon this accomplishment, we propose three methods to learn anchor spaces for reducing the approximation error. Furthermore, our proposed methods present great advantages for handling distributions of different sizes with GPU parallelization.

  • Research Article
  • Cite Count Icon 171
  • 10.1038/s41746-022-00613-w
Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials
  • Jun 8, 2022
  • NPJ Digital Medicine
  • Andre Esteva + 44 more

Prostate cancer is the most frequent cancer in men and a leading cause of cancer death. Determining a patient’s optimal therapy is a challenge, where oncologists must select a therapy with the highest likelihood of success and the lowest likelihood of toxicity. International standards for prognostication rely on non-specific and semi-quantitative tools, commonly leading to over- and under-treatment. Tissue-based molecular biomarkers have attempted to address this, but most have limited validation in prospective randomized trials and expensive processing costs, posing substantial barriers to widespread adoption. There remains a significant need for accurate and scalable tools to support therapy personalization. Here we demonstrate prostate cancer therapy personalization by predicting long-term, clinically relevant outcomes using a multimodal deep learning architecture and train models using clinical data and digital histopathology from prostate biopsies. We train and validate models using five phase III randomized trials conducted across hundreds of clinical centers. Histopathological data was available for 5654 of 7764 randomized patients (71%) with a median follow-up of 11.4 years. Compared to the most common risk-stratification tool—risk groups developed by the National Cancer Center Network (NCCN)—our models have superior discriminatory performance across all endpoints, ranging from 9.2% to 14.6% relative improvement in a held-out validation set. This artificial intelligence-based tool improves prognostication over standard tools and allows oncologists to computationally predict the likeliest outcomes of specific patients to determine optimal treatment. Outfitted with digital scanners and internet access, any clinic could offer such capabilities, enabling global access to therapy personalization.

  • Book Chapter
  • Cite Count Icon 13
  • 10.1007/978-3-540-44857-0_2
Optimal Shapes and Masses, and Optimal Transportation Problems
  • Jan 1, 2003
  • Giuseppe Buttazzo + 1 more

1 Introduction 2 Some classical problems 2.1 The isoperimetric problem 2.2 The Newton’s problem of optimal aerodynamical profiles 2.3 Optimal Dirichlet regions 2.4 Optimal mixtures of two conductors 3 Mass optimization problems 4 Optimal transportation problems 4.1 The optimal mass transportation problem: Monge and Kantorovich formulations 4.2 The PDE formulation of the mass transportation problem 5 Relationships between optimal mass and optimal transportation 6 Further results and open problems 6.1 A vectorial example 6.2 A p-Laplacian approximation 6.3 Optimization of Dirichlet regions 6.4 Optimal transporting distances References

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s43670-025-00097-1
Partial transport for point-cloud registration
  • Feb 18, 2025
  • Sampling Theory, Signal Processing, and Data Analysis
  • Yiku Bai + 3 more

Point cloud registration is an important task in fields like robotics, computer graphics, and medical imaging, involving the determination of spatial relationships between point sets in 3D space. Real-world challenges, such as non-rigid movements and partial visibility, including occlusions and sensor noise, make non-rigid registration particularly difficult. Traditional methods are often computationally intensive, exhibit unstable performance, and lack strong theoretical guarantees. Recently, the optimal transport problem, including its unbalanced variations like the optimal partial transport problem, has emerged as a powerful tool for point-cloud registration. These methods treat point clouds as empirical measures and provide a mathematically rigorous framework to quantify the correspondence between transformed source and target points. In this paper, we address the non-rigid registration problem using optimal transport theory and introduce a set of non-rigid registration methods based on the optimal partial transportation problem. Additionally, by leveraging efficient solutions to the one-dimensional optimal partial transport problem and extending them via slicing, we achieve significant computational efficiency, resulting in fast and robust registration algorithms. We validate our methods by comparing baselines on various 3D and 2D non-rigid registration problems with noisy point clouds.

  • Research Article
  • 10.1002/cam4.71221
Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics
  • Sep 1, 2025
  • Cancer Medicine
  • Shan Fang + 9 more

ABSTRACTBackgroundThe pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.MethodsData were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.ResultsA total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki‐67 expression, stromal tumor‐infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross‐validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.ConclusionsThe ML‐based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.

  • Research Article
  • Cite Count Icon 71
  • 10.1007/s11227-019-03101-3
Multimodal deep learning for finance: integrating and forecasting international stock markets
  • Dec 6, 2019
  • The Journal of Supercomputing
  • Sang Il Lee + 1 more

In today’s increasingly international economy, return and volatility spillover effects across international equity markets are major macroeconomic drivers of stock dynamics. Thus, information regarding foreign markets is one of the most important factors in forecasting domestic stock prices. However, the cross-correlation between domestic and foreign markets is highly complex. Hence, it is extremely difficult to explicitly express this cross-correlation with a dynamical equation. In this study, we develop stock return prediction models that can jointly consider international markets, using multimodal deep learning. Our contributions are threefold: (1) we visualize the transfer information between South Korea and US stock markets by using scatter plots; (2) we incorporate the information into the stock prediction models with the help of multimodal deep learning; (3) we conclusively demonstrate that the early and intermediate fusion models achieve a significant performance boost in comparison with the late fusion and single-modality models. Our study indicates that jointly considering international stock markets can improve the prediction accuracy and deep neural networks are highly effective for such tasks.

  • Research Article
  • Cite Count Icon 14
  • 10.1186/s12911-021-01700-w
Prediction of central venous catheter-associated deep venous thrombosis in pediatric critical care settings
  • Nov 27, 2021
  • BMC Medical Informatics and Decision Making
  • Haomin Li + 6 more

BackgroundAn increase in the incidence of central venous catheter (CVC)-associated deep venous thrombosis (CADVT) has been reported in pediatric patients over the past decade. At the same time, current screening guidelines for venous thromboembolism risk have low sensitivity for CADVT in hospitalized children. This study utilized a multimodal deep learning model to predict CADVT before it occurs.MethodsChildren who were admitted to intensive care units (ICUs) between December 2015 and December 2018 and with CVC placement at least 3 days were included. The variables analyzed included demographic characteristics, clinical conditions, laboratory test results, vital signs and medications. A multimodal deep learning (MMDL) model that can handle temporal data using long short-term memory (LSTM) and gated recurrent units (GRUs) was proposed for this prediction task. Four benchmark machine learning models, logistic regression (LR), random forest (RF), gradient boosting decision tree (GBDT) and a published cutting edge MMDL, were used to compare and evaluate the models with a fivefold cross-validation approach. Accuracy, recall, area under the ROC curve (AUC), and average precision (AP) were used to evaluate the discrimination of each model at three time points (24 h, 48 h and 72 h) before CADVT occurred. Brier score and Spiegelhalter’s z test were used measure the calibration of these prediction models.ResultsA total of 1830 patients were included in this study, and approximately 15% developed CADVT. In the CADVT prediction task, the model proposed in this paper significantly outperforms both traditional machine learning models and existing multimodal deep learning models at all 3 time points. It achieved 77% accuracy and 90% recall at 24 h before CADVT was discovered. It can be used to accurately predict the occurrence of CADVT 72 h in advance with an accuracy of greater than 75%, a recall of more than 87%, and an AUC value of 0.82.ConclusionIn this study, a machine learning method was successfully established to predict CADVT in advance. These findings demonstrate that artificial intelligence (AI) could provide measures for thromboprophylaxis in a pediatric intensive care setting.

  • Supplementary Content
  • Cite Count Icon 26
  • 10.1093/genetics/iyae161
A review of multimodal deep learning methods for genomic-enabled predictionin plant breeding
  • Nov 5, 2024
  • Genetics
  • Osval A Montesinos-López + 9 more

Deep learning methods have been applied when working to enhance the prediction accuracyof traditional statistical methods in the field of plant breeding. Although deep learningseems to be a promising approach for genomic prediction, it has proven to have somelimitations, since its conventional methods fail to leverage all available information.Multimodal deep learning methods aim to improve the predictive power of their unimodalcounterparts by introducing several modalities (sources) of input information. In thisreview, we introduce some theoretical basic concepts of multimodal deep learning andprovide a list of the most widely used neural network architectures in deep learning, aswell as the available strategies to fuse data from different modalities. We mention someof the available computational resources for the practical implementation of multimodaldeep learning problems. We finally performed a review of applications of multimodal deeplearning to genomic selection in plant breeding and other related fields. We present ameta-picture of the practical performance of multimodal deep learning methods to highlighthow these tools can help address complex problems in the field of plant breeding. Wediscussed some relevant considerations that researchers should keep in mind when applyingmultimodal deep learning methods. Multimodal deep learning holds significant potential forvarious fields, including genomic selection. While multimodal deep learning displaysenhanced prediction capabilities over unimodal deep learning and other machine learningmethods, it demands more computational resources. Multimodal deep learning effectivelycaptures intermodal interactions, especially when integrating data from different sources.To apply multimodal deep learning in genomic selection, suitable architectures and fusionstrategies must be chosen. It is relevant to keep in mind that multimodal deep learning,like unimodal deep learning, is a powerful tool but should be carefully applied. Given itspredictive edge over traditional methods, multimodal deep learning is valuable inaddressing challenges in plant breeding and food security amid a growing globalpopulation.

  • Research Article
  • Cite Count Icon 155
  • 10.1145/3545572
A Review on Methods and Applications in Multimodal Deep Learning
  • Feb 17, 2023
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Summaira Jabeen + 5 more

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This article focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, physiological signals, flow, RGB, pose, depth, mesh, and point cloud. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the past five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth. Last, main issues are highlighted separately for each domain, along with their possible future research directions.

  • Research Article
  • Cite Count Icon 27
  • 10.1097/corr.0000000000001367
CORR Synthesis: When Should We Be Skeptical of Clinical Prediction Models?
  • Jun 10, 2020
  • Clinical Orthopaedics &amp; Related Research
  • Aditya V Karhade + 1 more

CORR Synthesis: When Should We Be Skeptical of Clinical Prediction Models?

Save Icon
Up Arrow
Open/Close