Related Topics
Articles published on Multi-modal Deep Learning Models
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
355 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.ejrad.2026.112758
- Jun 1, 2026
- European journal of radiology
- Zhiqiang Wan + 8 more
Multi-modal deep learning model for predicting recurrence of moderately severe and severe acute pancreatitis.
- Research Article
- 10.1186/s12880-026-02411-2
- May 12, 2026
- BMC medical imaging
- Jia Peng + 9 more
Lung adenocarcinoma presenting as ground-glass nodules (GGNs) comprises three invasive subtypes (adenocarcinoma in situ [AIS], minimally invasive adenocarcinoma [MIA], invasive adenocarcinoma [IAC]) with distinct prognoses and management strategies. Preoperative discrimination of these subtypes remains challenging for radiologists, and existing deep learning models rarely integrate multi-modal data for reliable prediction. This study aimed to develop and internally validate a multi-modal fusion framework based on the standard ResNet50 architecture, integrating CT images, clinical variables, and tumor markers, to improve the preoperative prediction of ground-glass nodule invasiveness. A retrospective study was conducted including 431 patients with pathologically confirmed ground-glass nodules. All patients underwent standard chest computed tomography before surgery. A multi-modal deep learning model was constructed based on the ResNet50 network, combined with clinical characteristics and laboratory indicators. Model performance was evaluated using accuracy, area under the receiver operating characteristic curve, precision, recall, and F1-score with five-fold cross-validation. The proposed multi-modal model achieved an overall accuracy of 72.2%, precision of 95.6%, negative predictive value of 96.0%, weighted F1-score of 40.0%, and multiclass Matthews correlation coefficient of 73.1% in the three-class classification of AIS, MIA, and IAC. Per-class analysis showed precision of 84.6%, 35.7%, and 84.4% and recall of 57.9%, 29.4%, and 81.8% for AIS, MIA, and IAC, respectively. The fusion model yielded a macro-average AUC of 0.87, which was higher than the CT-only model (0.79) and both the senior (0.67) and junior radiologists (0.57). The model demonstrated superior diagnostic performance compared to human readers, particularly for the challenging MIA subtype. This multi-modal deep learning model combining CT images, clinical variables, and serum tumor markers enables accurate and robust three-class classification of AIS, MIA, and IAC in ground-glass nodules. The proposed model outperforms both human radiologists and the imaging-only model, suggesting its potential as a reliable auxiliary tool to improve preoperative prediction of lung adenocarcinoma invasiveness and assist clinical decision-making.
- Research Article
- 10.1186/s12885-026-16116-w
- May 11, 2026
- BMC cancer
- Ziming Yin + 8 more
Gallbladder cancer (GBC) is a rare gastrointestinal malignancy with a global 5-year survival rate of less than 5%. Early diagnosis is challenging owing to the lack of specific clinical symptoms. Additionally, the high heterogeneity of gallbladder tumors limits the clinical utility of unimodal deep-learning methods for GBC diagnosis. This study aimed to develop a novel multimodal deep-learning model to facilitate the preoperative diagnosis of GBC in more patients. We conducted a retrospective multicenter study using contrast-enhanced arterial phase computed tomography (CT) images and laboratory examination data from 300 patients (150 GBC cases and 150 non-GBC cases) extracted from electronic medical records of two Grade A tertiary hospitals in Shanghai between 2018 and 2020. A novel two-stage multimodal diagnostic model (GBC-DiagNet) was developed: the first stage achieved coarse segmentation of the gallbladder region using a position-constrained 3D Attention U-Net (improved by combined sampling) to avoid over-segmentation; the second stage realized GBC detection via an adaptive feature fusion strategy, which optimizes the weighted integration of handcrafted radiomic, deep radiomic and laboratory examination features to enhance diagnostic performance. On the independent test set, the model achieved an accuracy of 0.933 (95% confidence interval [95% CI]: 0.927-0.94), specificity of 0.912 (95% CI: 0.904-0.922), sensitivity of 0.962 (95% CI: 0.937-0.986), precision of 0.893 (95% CI: 0.875-0.911), an F1-score of 0.926 (95% CI: 0.919-0.932) and AUC (area under the curve) of 0.9706 (95% CI: 0.961-0.981). Compared with the optimal unimodal model, our model improved accuracy, sensitivity, and F1-score by 14.28%, 16.76%, and 16.85%, respectively. Furthermore, compared to state-of-the-art deep-learning architectures (ResNet, DenseNet, MobileNet, ConvNeXt, ViT), our model exhibited absolute improvements of 7.68% in accuracy, 8.03% in F1-score, and 0.0059 in AUC. The proposed multimodal model integrating contrast-enhanced CT and laboratory data achieves stable and clinically meaningful diagnostic performance for gallbladder cancer, supporting its utility as an artificial intelligence-assisted tool for preoperative noninvasive diagnosis.
- Research Article
- 10.1038/s41433-026-04505-1
- May 6, 2026
- Eye (London, England)
- Je Moon Yoon + 8 more
To develop and validate a multimodal deep learning model that predicts treatment responses to intravitreal anti-vascular endothelial growth factor (anti-VEGF) injections in patients with diabetic macular oedema (DMO) by combining optical coherence tomography images and clinical data. This study included 107 DMO patients who received three consecutive anti-VEGF treatments. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, and specificity. The model's predictions were compared with those of retinal specialists. Among 107 patients, 65 showed good response and 42 showed poor response to treatment. The multimodal model achieved an AUROC of 0.962 (95% CI, 0.945-0.979), accuracy of 0.953 (95% CI, 0.933-0.973), sensitivity of 0.969 (95% CI, 0.951-0.987), and specificity of 0.928 (95% CI, 0.903-0.953) in the internal validation. The model outperformed retinal specialists, who achieved accuracies ranging from 0.571 to 0.857. The multimodal deep learning model demonstrated high accuracy in predicting anti-VEGF treatment responses in DMO patients. This approach could enable more personalised treatment strategies and optimal resource utilisation in ophthalmological care. Further validation with larger, multicentre datasets is warranted to confirm its clinical utility.
- Research Article
- 10.1038/s41746-026-02697-0
- May 4, 2026
- NPJ digital medicine
- Shiwei Luo + 8 more
Accurate classification of renal masses before treatment is crucial for therapeutic decision-making and patient outcome. This study developed and validated Multi-Phase Attention Network (MPANet), a multimodal deep learning model integrating multiphase contrast-enhanced CT and clinical information, which can utilize both complete-phase and missing-phase CT data for multiclass classification of four common and easily confusable renal tumors-clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (pRCC), oncocytic neoplasms (including chromophobe renal cell carcinoma (chRCC) and renal oncocytoma (RO)), and fat-poor angiomyolipoma (fpAML). A total of 1688 multi-center cases were enrolled. Across all test sets, MPANet consistently outperformed single-phase models. In the internal test set, MPANet achieved a macro-average AUC of 0.850, a micro-average AUC of 0.865, and an accuracy of 73.3%. These results compared favorably to assessments by four radiologists based on CT (accuracies 43.6-62.4%) and two radiologists using MRI with clear cell likelihood score (ccLS) system (accuracies 52.5% and 49.5%). The net improvement rate of MPANet over radiologist assessment ranged from 10.9% to 29.7%. In the two external test sets, macro-average AUCs were 0.811 and 0.813, and micro-average AUCs were 0.867 and 0.909, respectively. MPANet shows potential as a clinical decision-support tool for personalized renal tumor diagnosis.
- Research Article
- 10.1093/bib/bbag224
- May 3, 2026
- Briefings in bioinformatics
- Po-Chun Chiu + 5 more
Ovarian cancer represents the primary cause of mortality from gynecological malignancies among women. Treatment strategies for benign versus malignant ovarian tumors differ significantly, making accurate preoperative diagnosis essential for clinical decision-making. Traditional ultrasound diagnosis is highly operator-dependent, introducing subjectivity and variability. To improve diagnostic precision in ovarian tumor classification, we developed a multimodal deep learning system that combines ultrasound images with corresponding clinical text reports. We retrospectively analyzed 1342 ultrasound images from 1062 patients who received surgical treatment for ovarian tumors at National Taiwan University Hospital from 2011 to 2021. Patients were classified into benign (n = 612) and malignant (including borderline, n = 450) groups based on pathology. A multimodal deep learning architecture was developed, incorporating DenseNet-121 and Swin Transformer for image feature extraction and Bio-Clinical BERT for processing clinical text reports. The dataset was split using subject-level stratification with five-fold cross-validation and a 15% independent test set. Furthermore, an external validation cohort of 268 effective cases from 3 independent medical centers was utilized to evaluate the model's generalizability. The multimodal model achieved superior performance at the subject level with 81.77% (95% CI: 75.89%, 86.48%) accuracy, 79.59% (95% CI: 70.57%, 86.38%) sensitivity, 83.81% (95% CI: 75.59%, 89.64%) specificity, and an area under the curve (AUC) of 0.88 (95% CI: 0.83, 0.93). In the external validation, the model maintained robust performance with an accuracy of 88.81%, sensitivity of 92.59%, and specificity of 84.96%, outperforming the International Ovarian Tumor Analysis Simple Rules (accuracy 86.4%). Integration of clinical text information significantly improved diagnostic performance compared to image-only models. Backward selection analysis revealed that both uterine findings and ovarian tumor descriptions contributed synergistically to the final diagnosis. This study successfully developed a multimodal deep learning model with diagnostic performance superior to traditional operator-dependent approaches. The model shows promise as a diagnostic tool for ovarian tumor classification, offering clinicians a way to improve preoperative diagnostic accuracy and enhance patient care quality.
- Research Article
- 10.1016/j.jbi.2026.105001
- May 1, 2026
- Journal of biomedical informatics
- Jennifer Martin + 9 more
Explainable multimodal deep learning models for variable-length sequences in critically ill patients.
- Research Article
- 10.3340/jkns.2026.0085
- May 1, 2026
- Journal of Korean Neurosurgical Society
- Joo Whan Kim
Pediatric neuro-oncology is a critical field of neurosurgery, representing the leading cause of disease-related mortality in children. Despite its rarity, it encompasses over 100 diverse disease entities, which significantly complicates preoperative differential diagnosis and surgical planning. This review examines how artificial intelligence (AI) can address these unmet clinical challenges throughout the perioperative period. Preoperatively, AI-driven radiogenomic models extract pixel-level features to enable non-invasive molecular subtyping, such as predicting B-Raf proto-oncogene alteration status in pediatric low-grade gliomas (pLGGs). Such insights are vital for determining the extent of resection (EOR) with consideration of availability of targeted therapies. Furthermore, AI facilitates automated tumor segmentation, allowing for meticulous surgical planning and more accurate assessment of surgical risks. Intraoperatively, AI significantly accelerates diagnostic turnaround times, which is essential for real-time decision-making. Emerging technologies, including Oxford Nanopore sequencing with neural network classifiers or stimulated Raman histology, allow for the rapid identification of tumor characteristics in operation time window. These tools directly inform the optimal EOR, particularly in cases like medulloblastoma where molecular subgroups dictate surgical aggressiveness. Additionally, AI integration into intraoperative neurophysiological monitoring enhances the preservation of critical neurological functions. Postoperatively, multimodal deep learning models integrate clinical, imaging, and genomic data to improve prognostic accuracy and standardize response assessment via AI integration. While challenges such as data scarcity and the "black box" nature of algorithms persist, innovative strategies offer potential solutions to AI application. AI serves as a transformative tool for personalized precision management, potentially bridging diagnostic disparities and optimizing clinical outcomes for children with central nervous system tumors.
- Research Article
- 10.1016/j.ejca.2026.116679
- May 1, 2026
- European journal of cancer (Oxford, England : 1990)
- Xiangxue Wang + 14 more
MuTriM: A multiscale deep learning model integrating longitudinal radiomics and pathomic features for predicting recurrence and adjuvant radiation benefit in breast cancer.
- Research Article
- 10.1007/s10278-026-01980-6
- Apr 29, 2026
- Journal of imaging informatics in medicine
- Quentin Vanderbecq + 6 more
This study aims to develop and externally validate a multimodal AI model for detecting ischemia complicating small-bowel obstruction (SBO). We combined 3D CT data with routine laboratory markers (C-reactive protein, neutrophil count) and, optionally, radiology report indication/history text. From two centers, 1350 CT examinations were curated; 771 confirmed SBO scans were used for model development with patient-level splits. Ischemia labels were defined by surgical confirmation within 24h of imaging. Models (MViT, ResNet-101, DaViT) were trained as unimodal and multimodal variants. External testing was used for 66 independent cases from a third center. Four radiologists (two residents and two experts) read the test set with and without AI assistance. Performance was assessed using AUC, sensitivity, specificity, and 95% bootstrap confidence intervals; predictions included a confidence score. The image-plus-laboratory model performed best on external testing (AUC 0.69 [0.59-0.79], sensitivity 0.89 [0.76-1.00], and specificity 0.44 [0.35-0.54]). Adding report text improved internal validation but did not generalize externally; image + text and full multimodal variants did not exceed image + laboratory performance. Across readers, baseline AUC ranged from 0.496 [0.361-0.640] to 0.745 [0.589-0.875] and increased with reader experience. With AI assistance, AUC ranged from 0.565 [0.419-0.717] to 0.845 [0.714-0.952] and from 0.519 [0.373-0.669] to 0.845 [0.708-0.954] when confidence scores were displayed, showing consistent but non-significant changes whatever the experience level. A multimodal model combining CT and lab data surpassed unimodal approaches for 24-h ischemia detection; as a triage-support tool, it showed a consistent but non-significant improvement in radiologist performance.
- Research Article
- 10.3389/fneur.2026.1791696
- Apr 21, 2026
- Frontiers in neurology
- Chaojun Chen + 4 more
Identifying multiple sclerosis (MS) in children early is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric onset MS (POMS) is unknown. We analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g., ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion. Scans from individuals with POMS (onset 16.0 ± 3.1 years, 51.0% female; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 ± 4.0 years, 69.0% female, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.90, weighted F 1: 0.87, macro F 1: 0.77, accuracy: 87%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, weighted F 1 of 0.85, macro F 1 of 0.73, and accuracy of 85%, while the best image-based model (ResNet101 with SVC) achieved an AUC of 0.79, weighted F 1 of 0.84, macro F 1 of 0.70, and accuracy of 87%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class. Multimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.
- Research Article
- 10.3390/biomedicines14040946
- Apr 21, 2026
- Biomedicines
- Yijun Yao + 7 more
Background/Objectives: Transcatheter aortic valve implantation (TAVI) in patients with bicuspid aortic valve (BAV) remains associated with higher rates of residual paravalvular leak (PVL), which confers a two-fold increase in mortality. Despite procedural optimization including balloon post-dilatation, a subset of patients exhibit residual ≥moderate PVL. Pre-procedural identification of these patients could guide procedural planning. Methods: We retrospectively analyzed 402 BAV patients who underwent TAVI with self-expanding valves and balloon post-dilatation between January 2016 and June 2024. A multi-modal deep learning model (Model B) was developed, integrating a 3D ResNet encoder for computed tomography (CT) imaging features with a multilayer perceptron (MLP) for clinical variables, fused via a cross-attention mechanism. Its performance was compared against a conventional model (Model A) combining clinical variables with manually derived CT measurements. Both models were evaluated on identical test folds using 5-fold stratified cross-validation. Results: Of 402 patients, 36 (9.0%) had residual ≥moderate PVL, associated with significantly larger aortic root dimensions at all anatomical levels and greater aortic valve calcification volume (median 887.6 vs. 559.2 mm3; p = 0.004). Model A achieved a mean AUC of 0.694 (95% CI: 0.596-0.792). Model B achieved a mean AUC of 0.822 (95% CI: 0.680-0.964), with a specificity of 0.971, accuracy of 0.881, and PPV of 0.860, while sensitivity was 0.429, reflecting the limited number of outcome events in this cohort. Conclusions: A multi-modal deep learning model integrating expert-segmented CT imaging with clinical variables demonstrated significantly improved discrimination over the conventional approach in this internal cohort for predicting residual PVL in BAV-TAVI, supporting the integration of segmentation-guided deep learning into pre-procedural TAVI planning. However, given the modest number of outcome events, external validation is required to confirm the generalizability of these findings.
- Research Article
- 10.3390/brainsci16040405
- Apr 10, 2026
- Brain sciences
- Chiara Camastra + 3 more
Background/Objectives: Multimodal data fusion is increasingly applied in neuroinformatics to integrate heterogeneous sources of information. However, the optimal strategies for combining modalities with markedly different dimensionality, scale, and noise characteristics remain unclear. To our knowledge, this is among the first systematic and controlled benchmarks explicitly disentangling the effects of fusion strategy and feature scaling within a unified deep learning framework. Methods: Using data from 747 healthy participants from the Human Connectome Project, we evaluated multiple fusion paradigms-including early fusion, attention-based fusion, subspace-based fusion, and graph-based fusion-within a unified and reproducible framework. Importantly, we assessed how different feature scaling techniques (Standard, Min-Max, and Robust scaling) interact with fusion strategies and influence model performance. Biological sex was used as a controlled benchmark task to focus on methodological insights rather than task-specific optimization. Results: Early feature-level fusion consistently achieved the highest classification performance across all evaluated configurations. In particular, direct concatenation of cognitive and neuroimaging features combined with Standard Scaling yielded the best results (AUC-ROC = 0.96 (0.95-0.96)), outperforming unimodal baselines as well as intermediate and late fusion strategies. Conclusions: This systematic benchmark demonstrates that multimodal deep learning performance in neuroscience is driven primarily by the interaction between fusion strategy and feature scaling rather than by architectural complexity alone. By explicitly disentangling the effects of fusion level and preprocessing within a unified framework, this study provides practical methodological guidance for the design, evaluation, and reproducible deployment of multimodal deep learning models in neuroscience.
- Research Article
- 10.3390/cancers18081194
- Apr 8, 2026
- Cancers
- Simon Baur + 13 more
Background/Objectives: Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predicting progression-free survival (PFS) could support individualized treatment planning. This study evaluates laboratory, imaging, and multimodal deep learning models for PFS prediction in PRRT-treated patients. Methods: In this retrospective, single-center study 116 patients with metastatic NETs undergoing [177Lu]Lu-DOTATOC were included. Clinical characteristics, laboratory values, and pretherapeutic somatostatin receptor positron emission tomography/computed tomographies (SR-PET/CTs) were collected. Seven models were trained to classify low- vs. high-PFS groups, including unimodal (laboratory, SR-PET, or CT) and multimodal fusion approaches. Performance was assessed via repeated 3-fold cross-validation with area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). Explainability was evaluated by feature importance analysis and gradient based saliency maps. Results: Forty-two patients (36%) displayed short PFS (≤1 year) and 74 patients displayed long PFS (>1 year). Groups were similar in most characteristics, except for higher baseline chromogranin A (p = 0.003), elevated γ-GT (p = 0.002), and fewer PRRT cycles (p < 0.001) in short-PFS patients. The Random Forest model trained only on laboratory biomarkers reached an AUROC of 0.59 ± 0.02. Unimodal three-dimensional convolutional neural networks using SR-PET or CT performed worse (AUROC 0.42 ± 0.03 and 0.54 ± 0.01, respectively). A multimodal fusion model integrating laboratory values, SR-PET, and CT-augmented with a pretrained CT branch-achieved the best results (AUROC 0.72 ± 0.01, AUPRC 0.80 ± 0.01). Explainability analyses provided insights into model predictions, with explainability patterns in the fusion model appearing physiologically plausible and predominantly tumor-focused. Conclusions: Multimodal deep learning combining SR-PET, CT, and laboratory biomarkers outperformed unimodal approaches for PFS prediction after PRRT. Upon external validation, such models may support risk-adapted follow-up strategies.
- Research Article
- 10.1186/s12880-026-02312-4
- Apr 6, 2026
- BMC medical imaging
- Lei Lai + 10 more
As the second deadly cancer affecting women globally, precise and timely classification of ovarian tumors plays an instrumental role in improving the rate of curing and reducing the rate of mortality. This study was set out to comprehensively investigate the effectiveness of deep learning model for classifying benign and malignant ovarian tumors, utilizing multimodal ultrasound images and clinical data, in comparison to traditional methods such as manual assessment by radiologists and those based on O-RADS. This retrospective multicenter study recruited women diagnosed with ovarian tumors between January 2022 and June 2023, with histopathological examination results as the reference diagnoses. The dataset was divided into three subsets: training (70%), validation (10%), and test (20%). Employing the Dense Convolutional Network algorithm, we constructed and investigated two fusion models: DLM2F, integrating multimodal features extracted ultrasound (grayscale ultrasound, color Doppler flow imaging), and DLM3F, integrating DLM2F with clinical data (e.g. age, CA125, CA199, HE4, SCC, ROMA index, menopausal state, and mass volume). The outcome measure was the area under the receiver operating characteristic curve (AUC). We compared the models' performance in the test dataset against both radiologists, O-RADS and single-mode models. A total of 508 patients with ovarian tumors (mean age: 44.3 ± 15.9 years) were enrolled, including 327 benign and 181 malignant tumors. In the test set, the DLM2F model demonstrated an AUC of 0.919, sensitivity of 0.865 and specificity of 0.879, while the DLM3F model showed an AUC of 0.951, sensitivity of 0.865 and specificity of 0.939. Comparatively, radiologists scored AUC of 896 (Expert level III) and 0.827 (Expert level I), while O-RADS was able to achieve an AUC of 0.835. Evaluation of confusion matrices revealed that DLM3F model exhibited almost identical accuracy as a level III expert, demonstrating its promising potential as an clinical diagnostic tool to assist junior radiologists. The deep learning model integrating multimodal ultrasound images and clinical information is capable of discriminating between benign and malignant ovarian tumors, exceeding the diagnostic capabilities of both radiologists and O-RADS assessments.
- Research Article
- 10.3174/ajnr.a9016
- Apr 2, 2026
- AJNR. American journal of neuroradiology
- Hongxi Yang + 11 more
Predicting the final location and volume of lesions in acute ischemic stroke is crucial for clinical management. While CTP is routinely used for estimating lesion outcomes, conventional threshold-based methods have limitations. We developed specialized outcome-prediction deep learning models that predict infarct core in successful reperfusion cases and the combined core-penumbra region in unsuccessful reperfusion cases. We developed single-modal and multimodal deep learning models using CTP parameter maps to predict the final infarct lesion on follow-up DWI. Using a multicenter data set from multiple sites, we developed deep learning models and evaluated them separately for patients with complete recanalization (successful reperfusion [CR], n = 350) and no recanalization (unsuccessful reperfusion [NR], n = 138) after treatment. The CR model was designed to predict the infarct core region, while the NR model predicted the expanded, hypoperfused tissue encompassing both the core and penumbra regions. Five-fold cross-validation was performed for robust evaluation. The multimodal 3D nnU-Net model demonstrated superior performance, achieving mean Dice scores of 35.36% in patients with CR and 50.22% in those with NR. This model substantially outperformed the current clinically used method, providing more accurate outcome estimates than the conventional single-technique threshold-based measures, which yielded Dice scores of 15.73% and 39.71% for CR and NR groups, respectively. Our approach offered both successful reperfusion and unsuccessful reperfusion estimations for potential treatment outcomes, enabling clinicians to better evaluate treatment eligibility for reperfusion therapies and assess potential treatment benefits. This advancement facilitates more personalized treatment recommendations and has the potential to substantially enhance clinical decision-making in acute ischemic stroke management by providing more accurate tissue outcome predictions than conventional single-technique threshold-based approaches.
- Research Article
- 10.1186/s12874-026-02845-w
- Apr 2, 2026
- BMC Medical Research Methodology
- Lixin Liu + 5 more
DSPONVNet: a multimodal deep learning model integrating intraoperative monitoring and clinical features for predicting postoperative nausea and vomiting risk
- Research Article
- 10.1007/s00595-025-03152-5
- Apr 1, 2026
- Surgery today
- Yasuharu Shinozaki + 10 more
This study aimed to assess the performance of a deep learning model using multimodal imaging for detecting lymph node metastasis in esophageal cancer in comparison to expert assessments. A retrospective analysis was performed for 521 lymph nodes from 167 patients with esophageal cancer who underwent esophagectomy. Deep learning models were developed based on multimodal imaging, including non-contrast-enhanced computed tomography, contrast-enhanced computed tomography, and positron emission tomography imaging. The diagnostic performance was evaluated and compared with expert assessments using a receiver operating characteristic curve analysis. The area under the receiver operating characteristic curve values for the deep learning model were 0.81 with multimodal imaging, 0.73 with non-contrast-enhanced computed tomography, 0.72 with contrast-enhanced computed tomography, and 0.75 with positron emission tomography were calculated. The area under the curve of the deep learning model (0.81) demonstrated diagnostic performance comparable to that of experienced experts (area under the curve, 0.84; P = 0.62, DeLong's test). The multimodal deep learning model using computed tomography and positron emission tomography demonstrated performance comparable to that of experts in diagnosing the presence of lymph node metastasis, a key prognostic factor in esophageal cancer, suggesting its potential clinical utility.
- Research Article
- 10.1016/s1470-2045(25)00727-2
- Apr 1, 2026
- The Lancet. Oncology
- Gil Shamai + 15 more
Genomic assays such as Oncotype DX have transformed adjuvant treatment selection for hormone receptor-positive, HER2-negative, early breast cancer but remain inaccessible to many patients because of high cost and logistical barriers. We aimed to develop and validate an artificial intelligence (AI) model that estimates Oncotype DX 21-gene recurrence scores directly from routine histopathology slides and clinicopathological variables. In this multicentre, model development and validation study, a multimodal deep-learning model was trained on digital whole-slide images and clinical features using a foundation model pre-trained on 171 189 histopathology slides for predicting Oncotype DX recurrence score. We included slides from patients with hormone receptor-positive, HER2-negative, invasive breast cancers and without scanning artifacts and with at least 100 tissue tiles (1·6 mm2). The model was fine-tuned and validated on the TAILORx randomised trial (8284 patients after quality control). Prognostic and predictive performance was assessed in the TAILORx-test set and externally validated in six independent cohorts (Carmel, Haemek, and Sheba medical centres [Israel], the University of Chicago Medical Center [USA], the Australian Breast Cancer Tissue Bank [Australia], and the Cancer Genome Atlas Breast Invasive Carcinoma project [USA]). In the TAILORx-test set (n=2407), the AI model classified 1097 (45·6%) patients as low risk, 1021 (42·4%) as intermediate risk, and 289 (12·0%) as high risk. For identifying high genomic-risk disease (recurrence score ≥26), the area under the curve (AUC) was 0·898 (95% CI 0·879-0·913). AI-based risk stratification was prognostic for recurrence-free interval (hazard ratio 2·61 [95% CI 1·68-4·04]), distant recurrence-free interval (2·88 [1·73-4·79]), and disease-free survival (1·32 [0·92-1·89]). Chemotherapy benefit was evident in premenopausal patients classified by AI as being at high risk (0·63 [0·46-0·86]) but absent in postmenopausal patients classified by AI as being at low risk (0·94 [0·78-1·12]). 151 (31·3%) clinically high-risk postmenopausal women (by MINDACT criteria) were reclassified as low AI risk with no chemotherapy benefit. Analysis on external cohorts (5497 patients) showed that the model is transferable to new data with high generalisability (recurrence score ≥26 AUC ranging from 0·858 to 0·903). These findings show that AI applied to routine histopathology can serve as a practical and scalable tool for guiding chemotherapy decisions in hormone receptor-positive, HER2-negative, early breast cancer. This approach has the potential to reduce unnecessary chemotherapy and broaden access to precision oncology, particularly in resource-limited settings where genomic testing remains unavailable or unaffordable. Israel Innovation Authority (Kamin), Zimin Institute for Artificial Intelligence Solutions in Healthcare, Israel Precision Medicine Partnership program, and Israel Cancer Research Fund.
- Research Article
- 10.1002/mco2.70730
- Apr 1, 2026
- MedComm
- Junxian Li + 9 more
Chest radiographs (CXRs) may encode prognostic signals beyond pulmonary nodule detection. We developed LungProNet, a multimodal deep-learning (DL) model that fuses CXR features with four epidemiologic variables (age, sex, smoking history, and family history) for pulmonary nodule detection as the primary task, with secondary validation for all-cause and cause-specific mortality prediction. LungProNet was trained and internally validated on Tianjin Lung Cancer Imaging Dataset (TLCID) (70/30; n = 2852/1227) and externally validated on ChestDR (n = 4848), with stratified analyses across epidemiologic strata. Discrimination was quantified by area under the curve (AUC) (95% confidence intervals), with accuracy, sensitivity, and specificity reported, and results were benchmarked against contemporary machine learning/DL baselines. The pretrained multimodal encoder was transferred without fine-tuning to the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) (n = 24,697); its fused embeddings were used as covariates in Cox proportional-hazards models, and time-dependent AUCs were evaluated at 1-12 years. For nodule detection, AUCs were 0.979 (0.975-0.982) in TLCID and 0.849 (0.835-0.862) in ChestDR; the TLCID stratified model reached 0.990 (0.984-0.994). In PLCO, AUCs were 0.925 (0.892-0.952) for all-cause mortality and 0.939-0.985 for cardiac-, lung cancer-, and Chronic Obstructive Pulmonary Disease (COPD)-cause mortality, with robust subgroup performance. These results support CXR-based nodule flagging within screening workflows and suggest secondary opportunistic risk stratification potential.