Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Learning Architecture
  • Learning Architecture

Articles published on Multi-modal Deep Learning Models

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
355 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.ejrad.2026.112758
Multi-modal deep learning model for predicting recurrence of moderately severe and severe acute pancreatitis.
  • Jun 1, 2026
  • European journal of radiology
  • Zhiqiang Wan + 8 more

Multi-modal deep learning model for predicting recurrence of moderately severe and severe acute pancreatitis.

  • Research Article
  • 10.1186/s12880-026-02411-2
Deep residual network fusing CT images and clinical variables to predict lung adenocarcinoma aggressiveness.
  • May 12, 2026
  • BMC medical imaging
  • Jia Peng + 9 more

Lung adenocarcinoma presenting as ground-glass nodules (GGNs) comprises three invasive subtypes (adenocarcinoma in situ [AIS], minimally invasive adenocarcinoma [MIA], invasive adenocarcinoma [IAC]) with distinct prognoses and management strategies. Preoperative discrimination of these subtypes remains challenging for radiologists, and existing deep learning models rarely integrate multi-modal data for reliable prediction. This study aimed to develop and internally validate a multi-modal fusion framework based on the standard ResNet50 architecture, integrating CT images, clinical variables, and tumor markers, to improve the preoperative prediction of ground-glass nodule invasiveness. A retrospective study was conducted including 431 patients with pathologically confirmed ground-glass nodules. All patients underwent standard chest computed tomography before surgery. A multi-modal deep learning model was constructed based on the ResNet50 network, combined with clinical characteristics and laboratory indicators. Model performance was evaluated using accuracy, area under the receiver operating characteristic curve, precision, recall, and F1-score with five-fold cross-validation. The proposed multi-modal model achieved an overall accuracy of 72.2%, precision of 95.6%, negative predictive value of 96.0%, weighted F1-score of 40.0%, and multiclass Matthews correlation coefficient of 73.1% in the three-class classification of AIS, MIA, and IAC. Per-class analysis showed precision of 84.6%, 35.7%, and 84.4% and recall of 57.9%, 29.4%, and 81.8% for AIS, MIA, and IAC, respectively. The fusion model yielded a macro-average AUC of 0.87, which was higher than the CT-only model (0.79) and both the senior (0.67) and junior radiologists (0.57). The model demonstrated superior diagnostic performance compared to human readers, particularly for the challenging MIA subtype. This multi-modal deep learning model combining CT images, clinical variables, and serum tumor markers enables accurate and robust three-class classification of AIS, MIA, and IAC in ground-glass nodules. The proposed model outperforms both human radiologists and the imaging-only model, suggesting its potential as a reliable auxiliary tool to improve preoperative prediction of lung adenocarcinoma invasiveness and assist clinical decision-making.

  • Research Article
  • 10.1186/s12885-026-16116-w
Multimodal deep learning model integrating electronic medical records and CT images for gallbladder cancer diagnosis: a retrospective multicenter study in China.
  • May 11, 2026
  • BMC cancer
  • Ziming Yin + 8 more

Gallbladder cancer (GBC) is a rare gastrointestinal malignancy with a global 5-year survival rate of less than 5%. Early diagnosis is challenging owing to the lack of specific clinical symptoms. Additionally, the high heterogeneity of gallbladder tumors limits the clinical utility of unimodal deep-learning methods for GBC diagnosis. This study aimed to develop a novel multimodal deep-learning model to facilitate the preoperative diagnosis of GBC in more patients. We conducted a retrospective multicenter study using contrast-enhanced arterial phase computed tomography (CT) images and laboratory examination data from 300 patients (150 GBC cases and 150 non-GBC cases) extracted from electronic medical records of two Grade A tertiary hospitals in Shanghai between 2018 and 2020. A novel two-stage multimodal diagnostic model (GBC-DiagNet) was developed: the first stage achieved coarse segmentation of the gallbladder region using a position-constrained 3D Attention U-Net (improved by combined sampling) to avoid over-segmentation; the second stage realized GBC detection via an adaptive feature fusion strategy, which optimizes the weighted integration of handcrafted radiomic, deep radiomic and laboratory examination features to enhance diagnostic performance. On the independent test set, the model achieved an accuracy of 0.933 (95% confidence interval [95% CI]: 0.927-0.94), specificity of 0.912 (95% CI: 0.904-0.922), sensitivity of 0.962 (95% CI: 0.937-0.986), precision of 0.893 (95% CI: 0.875-0.911), an F1-score of 0.926 (95% CI: 0.919-0.932) and AUC (area under the curve) of 0.9706 (95% CI: 0.961-0.981). Compared with the optimal unimodal model, our model improved accuracy, sensitivity, and F1-score by 14.28%, 16.76%, and 16.85%, respectively. Furthermore, compared to state-of-the-art deep-learning architectures (ResNet, DenseNet, MobileNet, ConvNeXt, ViT), our model exhibited absolute improvements of 7.68% in accuracy, 8.03% in F1-score, and 0.0059 in AUC. The proposed multimodal model integrating contrast-enhanced CT and laboratory data achieves stable and clinically meaningful diagnostic performance for gallbladder cancer, supporting its utility as an artificial intelligence-assisted tool for preoperative noninvasive diagnosis.

  • Research Article
  • 10.1038/s41433-026-04505-1
Multimodal deep learning prediction of treatment response to anti-vascular endothelial growth factor in diabetic macular oedema.
  • May 6, 2026
  • Eye (London, England)
  • Je Moon Yoon + 8 more

To develop and validate a multimodal deep learning model that predicts treatment responses to intravitreal anti-vascular endothelial growth factor (anti-VEGF) injections in patients with diabetic macular oedema (DMO) by combining optical coherence tomography images and clinical data. This study included 107 DMO patients who received three consecutive anti-VEGF treatments. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, and specificity. The model's predictions were compared with those of retinal specialists. Among 107 patients, 65 showed good response and 42 showed poor response to treatment. The multimodal model achieved an AUROC of 0.962 (95% CI, 0.945-0.979), accuracy of 0.953 (95% CI, 0.933-0.973), sensitivity of 0.969 (95% CI, 0.951-0.987), and specificity of 0.928 (95% CI, 0.903-0.953) in the internal validation. The model outperformed retinal specialists, who achieved accuracies ranging from 0.571 to 0.857. The multimodal deep learning model demonstrated high accuracy in predicting anti-VEGF treatment responses in DMO patients. This approach could enable more personalised treatment strategies and optimal resource utilisation in ophthalmological care. Further validation with larger, multicentre datasets is warranted to confirm its clinical utility.

  • Research Article
  • 10.1038/s41746-026-02697-0
Multimodal deep learning model for multiclass classification of renal tumors.
  • May 4, 2026
  • NPJ digital medicine
  • Shiwei Luo + 8 more

Accurate classification of renal masses before treatment is crucial for therapeutic decision-making and patient outcome. This study developed and validated Multi-Phase Attention Network (MPANet), a multimodal deep learning model integrating multiphase contrast-enhanced CT and clinical information, which can utilize both complete-phase and missing-phase CT data for multiclass classification of four common and easily confusable renal tumors-clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (pRCC), oncocytic neoplasms (including chromophobe renal cell carcinoma (chRCC) and renal oncocytoma (RO)), and fat-poor angiomyolipoma (fpAML). A total of 1688 multi-center cases were enrolled. Across all test sets, MPANet consistently outperformed single-phase models. In the internal test set, MPANet achieved a macro-average AUC of 0.850, a micro-average AUC of 0.865, and an accuracy of 73.3%. These results compared favorably to assessments by four radiologists based on CT (accuracies 43.6-62.4%) and two radiologists using MRI with clear cell likelihood score (ccLS) system (accuracies 52.5% and 49.5%). The net improvement rate of MPANet over radiologist assessment ranged from 10.9% to 29.7%. In the two external test sets, macro-average AUCs were 0.811 and 0.813, and micro-average AUCs were 0.867 and 0.909, respectively. MPANet shows potential as a clinical decision-support tool for personalized renal tumor diagnosis.

  • Research Article
  • 10.1093/bib/bbag224
Development of a novel multimodal deep learning approach to improve diagnostic precision in ovarian cancer.
  • May 3, 2026
  • Briefings in bioinformatics
  • Po-Chun Chiu + 5 more

Ovarian cancer represents the primary cause of mortality from gynecological malignancies among women. Treatment strategies for benign versus malignant ovarian tumors differ significantly, making accurate preoperative diagnosis essential for clinical decision-making. Traditional ultrasound diagnosis is highly operator-dependent, introducing subjectivity and variability. To improve diagnostic precision in ovarian tumor classification, we developed a multimodal deep learning system that combines ultrasound images with corresponding clinical text reports. We retrospectively analyzed 1342 ultrasound images from 1062 patients who received surgical treatment for ovarian tumors at National Taiwan University Hospital from 2011 to 2021. Patients were classified into benign (n = 612) and malignant (including borderline, n = 450) groups based on pathology. A multimodal deep learning architecture was developed, incorporating DenseNet-121 and Swin Transformer for image feature extraction and Bio-Clinical BERT for processing clinical text reports. The dataset was split using subject-level stratification with five-fold cross-validation and a 15% independent test set. Furthermore, an external validation cohort of 268 effective cases from 3 independent medical centers was utilized to evaluate the model's generalizability. The multimodal model achieved superior performance at the subject level with 81.77% (95% CI: 75.89%, 86.48%) accuracy, 79.59% (95% CI: 70.57%, 86.38%) sensitivity, 83.81% (95% CI: 75.59%, 89.64%) specificity, and an area under the curve (AUC) of 0.88 (95% CI: 0.83, 0.93). In the external validation, the model maintained robust performance with an accuracy of 88.81%, sensitivity of 92.59%, and specificity of 84.96%, outperforming the International Ovarian Tumor Analysis Simple Rules (accuracy 86.4%). Integration of clinical text information significantly improved diagnostic performance compared to image-only models. Backward selection analysis revealed that both uterine findings and ovarian tumor descriptions contributed synergistically to the final diagnosis. This study successfully developed a multimodal deep learning model with diagnostic performance superior to traditional operator-dependent approaches. The model shows promise as a diagnostic tool for ovarian tumor classification, offering clinicians a way to improve preoperative diagnostic accuracy and enhance patient care quality.

  • Research Article
  • 10.1016/j.jbi.2026.105001
Explainable multimodal deep learning models for variable-length sequences in critically ill patients.
  • May 1, 2026
  • Journal of biomedical informatics
  • Jennifer Martin + 9 more

Explainable multimodal deep learning models for variable-length sequences in critically ill patients.

  • Research Article
  • 10.3340/jkns.2026.0085
Neurosurgical Application of Artificial Intelligence in Pediatric Neuro-Oncology.
  • May 1, 2026
  • Journal of Korean Neurosurgical Society
  • Joo Whan Kim

Pediatric neuro-oncology is a critical field of neurosurgery, representing the leading cause of disease-related mortality in children. Despite its rarity, it encompasses over 100 diverse disease entities, which significantly complicates preoperative differential diagnosis and surgical planning. This review examines how artificial intelligence (AI) can address these unmet clinical challenges throughout the perioperative period. Preoperatively, AI-driven radiogenomic models extract pixel-level features to enable non-invasive molecular subtyping, such as predicting B-Raf proto-oncogene alteration status in pediatric low-grade gliomas (pLGGs). Such insights are vital for determining the extent of resection (EOR) with consideration of availability of targeted therapies. Furthermore, AI facilitates automated tumor segmentation, allowing for meticulous surgical planning and more accurate assessment of surgical risks. Intraoperatively, AI significantly accelerates diagnostic turnaround times, which is essential for real-time decision-making. Emerging technologies, including Oxford Nanopore sequencing with neural network classifiers or stimulated Raman histology, allow for the rapid identification of tumor characteristics in operation time window. These tools directly inform the optimal EOR, particularly in cases like medulloblastoma where molecular subgroups dictate surgical aggressiveness. Additionally, AI integration into intraoperative neurophysiological monitoring enhances the preservation of critical neurological functions. Postoperatively, multimodal deep learning models integrate clinical, imaging, and genomic data to improve prognostic accuracy and standardize response assessment via AI integration. While challenges such as data scarcity and the "black box" nature of algorithms persist, innovative strategies offer potential solutions to AI application. AI serves as a transformative tool for personalized precision management, potentially bridging diagnostic disparities and optimizing clinical outcomes for children with central nervous system tumors.

  • Research Article
  • 10.1016/j.ejca.2026.116679
MuTriM: A multiscale deep learning model integrating longitudinal radiomics and pathomic features for predicting recurrence and adjuvant radiation benefit in breast cancer.
  • May 1, 2026
  • European journal of cancer (Oxford, England : 1990)
  • Xiangxue Wang + 14 more

MuTriM: A multiscale deep learning model integrating longitudinal radiomics and pathomic features for predicting recurrence and adjuvant radiation benefit in breast cancer.

  • Research Article
  • 10.1007/s10278-026-01980-6
Radiologist-AI Collaboration for Ischemia Diagnosis in Small-Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model.
  • Apr 29, 2026
  • Journal of imaging informatics in medicine
  • Quentin Vanderbecq + 6 more

This study aims to develop and externally validate a multimodal AI model for detecting ischemia complicating small-bowel obstruction (SBO). We combined 3D CT data with routine laboratory markers (C-reactive protein, neutrophil count) and, optionally, radiology report indication/history text. From two centers, 1350 CT examinations were curated; 771 confirmed SBO scans were used for model development with patient-level splits. Ischemia labels were defined by surgical confirmation within 24h of imaging. Models (MViT, ResNet-101, DaViT) were trained as unimodal and multimodal variants. External testing was used for 66 independent cases from a third center. Four radiologists (two residents and two experts) read the test set with and without AI assistance. Performance was assessed using AUC, sensitivity, specificity, and 95% bootstrap confidence intervals; predictions included a confidence score. The image-plus-laboratory model performed best on external testing (AUC 0.69 [0.59-0.79], sensitivity 0.89 [0.76-1.00], and specificity 0.44 [0.35-0.54]). Adding report text improved internal validation but did not generalize externally; image + text and full multimodal variants did not exceed image + laboratory performance. Across readers, baseline AUC ranged from 0.496 [0.361-0.640] to 0.745 [0.589-0.875] and increased with reader experience. With AI assistance, AUC ranged from 0.565 [0.419-0.717] to 0.845 [0.714-0.952] and from 0.519 [0.373-0.669] to 0.845 [0.708-0.954] when confidence scores were displayed, showing consistent but non-significant changes whatever the experience level. A multimodal model combining CT and lab data surpassed unimodal approaches for 24-h ischemia detection; as a triage-support tool, it showed a consistent but non-significant improvement in radiologist performance.

  • Research Article
  • 10.3389/fneur.2026.1791696
Multimodal machine learning for distinguishing pediatric multiple sclerosis from non-inflammatory conditions using optical coherence tomography.
  • Apr 21, 2026
  • Frontiers in neurology
  • Chaojun Chen + 4 more

Identifying multiple sclerosis (MS) in children early is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric onset MS (POMS) is unknown. We analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g., ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion. Scans from individuals with POMS (onset 16.0 ± 3.1 years, 51.0% female; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 ± 4.0 years, 69.0% female, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.90, weighted F 1: 0.87, macro F 1: 0.77, accuracy: 87%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, weighted F 1 of 0.85, macro F 1 of 0.73, and accuracy of 85%, while the best image-based model (ResNet101 with SVC) achieved an AUC of 0.79, weighted F 1 of 0.84, macro F 1 of 0.70, and accuracy of 87%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class. Multimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.

  • Research Article
  • 10.3390/biomedicines14040946
Neural Network-Based Prediction of Residual Paravalvular Leak in Bicuspid Aortic Valve TAVI Using CT-Derived Anatomical Features.
  • Apr 21, 2026
  • Biomedicines
  • Yijun Yao + 7 more

Background/Objectives: Transcatheter aortic valve implantation (TAVI) in patients with bicuspid aortic valve (BAV) remains associated with higher rates of residual paravalvular leak (PVL), which confers a two-fold increase in mortality. Despite procedural optimization including balloon post-dilatation, a subset of patients exhibit residual ≥moderate PVL. Pre-procedural identification of these patients could guide procedural planning. Methods: We retrospectively analyzed 402 BAV patients who underwent TAVI with self-expanding valves and balloon post-dilatation between January 2016 and June 2024. A multi-modal deep learning model (Model B) was developed, integrating a 3D ResNet encoder for computed tomography (CT) imaging features with a multilayer perceptron (MLP) for clinical variables, fused via a cross-attention mechanism. Its performance was compared against a conventional model (Model A) combining clinical variables with manually derived CT measurements. Both models were evaluated on identical test folds using 5-fold stratified cross-validation. Results: Of 402 patients, 36 (9.0%) had residual ≥moderate PVL, associated with significantly larger aortic root dimensions at all anatomical levels and greater aortic valve calcification volume (median 887.6 vs. 559.2 mm3; p = 0.004). Model A achieved a mean AUC of 0.694 (95% CI: 0.596-0.792). Model B achieved a mean AUC of 0.822 (95% CI: 0.680-0.964), with a specificity of 0.971, accuracy of 0.881, and PPV of 0.860, while sensitivity was 0.429, reflecting the limited number of outcome events in this cohort. Conclusions: A multi-modal deep learning model integrating expert-segmented CT imaging with clinical variables demonstrated significantly improved discrimination over the conventional approach in this internal cohort for predicting residual PVL in BAV-TAVI, supporting the integration of segmentation-guided deep learning into pre-procedural TAVI planning. However, given the modest number of outcome events, external validation is required to confirm the generalizability of these findings.

  • Research Article
  • 10.3390/brainsci16040405
Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task.
  • Apr 10, 2026
  • Brain sciences
  • Chiara Camastra + 3 more

Background/Objectives: Multimodal data fusion is increasingly applied in neuroinformatics to integrate heterogeneous sources of information. However, the optimal strategies for combining modalities with markedly different dimensionality, scale, and noise characteristics remain unclear. To our knowledge, this is among the first systematic and controlled benchmarks explicitly disentangling the effects of fusion strategy and feature scaling within a unified deep learning framework. Methods: Using data from 747 healthy participants from the Human Connectome Project, we evaluated multiple fusion paradigms-including early fusion, attention-based fusion, subspace-based fusion, and graph-based fusion-within a unified and reproducible framework. Importantly, we assessed how different feature scaling techniques (Standard, Min-Max, and Robust scaling) interact with fusion strategies and influence model performance. Biological sex was used as a controlled benchmark task to focus on methodological insights rather than task-specific optimization. Results: Early feature-level fusion consistently achieved the highest classification performance across all evaluated configurations. In particular, direct concatenation of cognitive and neuroimaging features combined with Standard Scaling yielded the best results (AUC-ROC = 0.96 (0.95-0.96)), outperforming unimodal baselines as well as intermediate and late fusion strategies. Conclusions: This systematic benchmark demonstrates that multimodal deep learning performance in neuroscience is driven primarily by the interaction between fusion strategy and feature scaling rather than by architectural complexity alone. By explicitly disentangling the effects of fusion level and preprocessing within a unified framework, this study provides practical methodological guidance for the design, evaluation, and reproducible deployment of multimodal deep learning models in neuroscience.

  • Research Article
  • 10.3390/cancers18081194
Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing 177Lu-Based Peptide Receptor Radionuclide Therapy.
  • Apr 8, 2026
  • Cancers
  • Simon Baur + 13 more

Background/Objectives: Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predicting progression-free survival (PFS) could support individualized treatment planning. This study evaluates laboratory, imaging, and multimodal deep learning models for PFS prediction in PRRT-treated patients. Methods: In this retrospective, single-center study 116 patients with metastatic NETs undergoing [177Lu]Lu-DOTATOC were included. Clinical characteristics, laboratory values, and pretherapeutic somatostatin receptor positron emission tomography/computed tomographies (SR-PET/CTs) were collected. Seven models were trained to classify low- vs. high-PFS groups, including unimodal (laboratory, SR-PET, or CT) and multimodal fusion approaches. Performance was assessed via repeated 3-fold cross-validation with area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). Explainability was evaluated by feature importance analysis and gradient based saliency maps. Results: Forty-two patients (36%) displayed short PFS (≤1 year) and 74 patients displayed long PFS (>1 year). Groups were similar in most characteristics, except for higher baseline chromogranin A (p = 0.003), elevated γ-GT (p = 0.002), and fewer PRRT cycles (p < 0.001) in short-PFS patients. The Random Forest model trained only on laboratory biomarkers reached an AUROC of 0.59 ± 0.02. Unimodal three-dimensional convolutional neural networks using SR-PET or CT performed worse (AUROC 0.42 ± 0.03 and 0.54 ± 0.01, respectively). A multimodal fusion model integrating laboratory values, SR-PET, and CT-augmented with a pretrained CT branch-achieved the best results (AUROC 0.72 ± 0.01, AUPRC 0.80 ± 0.01). Explainability analyses provided insights into model predictions, with explainability patterns in the fusion model appearing physiologically plausible and predominantly tumor-focused. Conclusions: Multimodal deep learning combining SR-PET, CT, and laboratory biomarkers outperformed unimodal approaches for PFS prediction after PRRT. Upon external validation, such models may support risk-adapted follow-up strategies.

  • Research Article
  • 10.1186/s12880-026-02312-4
Investigation of multimodal deep learning models for predicting ovarian tumor malignancy based on ultrasound images and clinical information - a comprehensive comparative study against readers and O-RADS.
  • Apr 6, 2026
  • BMC medical imaging
  • Lei Lai + 10 more

As the second deadly cancer affecting women globally, precise and timely classification of ovarian tumors plays an instrumental role in improving the rate of curing and reducing the rate of mortality. This study was set out to comprehensively investigate the effectiveness of deep learning model for classifying benign and malignant ovarian tumors, utilizing multimodal ultrasound images and clinical data, in comparison to traditional methods such as manual assessment by radiologists and those based on O-RADS. This retrospective multicenter study recruited women diagnosed with ovarian tumors between January 2022 and June 2023, with histopathological examination results as the reference diagnoses. The dataset was divided into three subsets: training (70%), validation (10%), and test (20%). Employing the Dense Convolutional Network algorithm, we constructed and investigated two fusion models: DLM2F, integrating multimodal features extracted ultrasound (grayscale ultrasound, color Doppler flow imaging), and DLM3F, integrating DLM2F with clinical data (e.g. age, CA125, CA199, HE4, SCC, ROMA index, menopausal state, and mass volume). The outcome measure was the area under the receiver operating characteristic curve (AUC). We compared the models' performance in the test dataset against both radiologists, O-RADS and single-mode models. A total of 508 patients with ovarian tumors (mean age: 44.3 ± 15.9 years) were enrolled, including 327 benign and 181 malignant tumors. In the test set, the DLM2F model demonstrated an AUC of 0.919, sensitivity of 0.865 and specificity of 0.879, while the DLM3F model showed an AUC of 0.951, sensitivity of 0.865 and specificity of 0.939. Comparatively, radiologists scored AUC of 896 (Expert level III) and 0.827 (Expert level I), while O-RADS was able to achieve an AUC of 0.835. Evaluation of confusion matrices revealed that DLM3F model exhibited almost identical accuracy as a level III expert, demonstrating its promising potential as an clinical diagnostic tool to assist junior radiologists. The deep learning model integrating multimodal ultrasound images and clinical information is capable of discriminating between benign and malignant ovarian tumors, exceeding the diagnostic capabilities of both radiologists and O-RADS assessments.

  • Research Article
  • 10.3174/ajnr.a9016
Multimodal CT Perfusion-Based Deep Learning for Predicting Stroke Lesion Outcomes in Complete and No Recanalization Scenarios.
  • Apr 2, 2026
  • AJNR. American journal of neuroradiology
  • Hongxi Yang + 11 more

Predicting the final location and volume of lesions in acute ischemic stroke is crucial for clinical management. While CTP is routinely used for estimating lesion outcomes, conventional threshold-based methods have limitations. We developed specialized outcome-prediction deep learning models that predict infarct core in successful reperfusion cases and the combined core-penumbra region in unsuccessful reperfusion cases. We developed single-modal and multimodal deep learning models using CTP parameter maps to predict the final infarct lesion on follow-up DWI. Using a multicenter data set from multiple sites, we developed deep learning models and evaluated them separately for patients with complete recanalization (successful reperfusion [CR], n = 350) and no recanalization (unsuccessful reperfusion [NR], n = 138) after treatment. The CR model was designed to predict the infarct core region, while the NR model predicted the expanded, hypoperfused tissue encompassing both the core and penumbra regions. Five-fold cross-validation was performed for robust evaluation. The multimodal 3D nnU-Net model demonstrated superior performance, achieving mean Dice scores of 35.36% in patients with CR and 50.22% in those with NR. This model substantially outperformed the current clinically used method, providing more accurate outcome estimates than the conventional single-technique threshold-based measures, which yielded Dice scores of 15.73% and 39.71% for CR and NR groups, respectively. Our approach offered both successful reperfusion and unsuccessful reperfusion estimations for potential treatment outcomes, enabling clinicians to better evaluate treatment eligibility for reperfusion therapies and assess potential treatment benefits. This advancement facilitates more personalized treatment recommendations and has the potential to substantially enhance clinical decision-making in acute ischemic stroke management by providing more accurate tissue outcome predictions than conventional single-technique threshold-based approaches.

  • Research Article
  • 10.1186/s12874-026-02845-w
DSPONVNet: a multimodal deep learning model integrating intraoperative monitoring and clinical features for predicting postoperative nausea and vomiting risk
  • Apr 2, 2026
  • BMC Medical Research Methodology
  • Lixin Liu + 5 more

DSPONVNet: a multimodal deep learning model integrating intraoperative monitoring and clinical features for predicting postoperative nausea and vomiting risk

  • Research Article
  • 10.1007/s00595-025-03152-5
Utility of multimodal deep learning model to diagnose lymph node metastasis in esophageal cancer using computed tomography and positron emission tomography images.
  • Apr 1, 2026
  • Surgery today
  • Yasuharu Shinozaki + 10 more

This study aimed to assess the performance of a deep learning model using multimodal imaging for detecting lymph node metastasis in esophageal cancer in comparison to expert assessments. A retrospective analysis was performed for 521 lymph nodes from 167 patients with esophageal cancer who underwent esophagectomy. Deep learning models were developed based on multimodal imaging, including non-contrast-enhanced computed tomography, contrast-enhanced computed tomography, and positron emission tomography imaging. The diagnostic performance was evaluated and compared with expert assessments using a receiver operating characteristic curve analysis. The area under the receiver operating characteristic curve values for the deep learning model were 0.81 with multimodal imaging, 0.73 with non-contrast-enhanced computed tomography, 0.72 with contrast-enhanced computed tomography, and 0.75 with positron emission tomography were calculated. The area under the curve of the deep learning model (0.81) demonstrated diagnostic performance comparable to that of experienced experts (area under the curve, 0.84; P = 0.62, DeLong's test). The multimodal deep learning model using computed tomography and positron emission tomography demonstrated performance comparable to that of experts in diagnosing the presence of lymph node metastasis, a key prognostic factor in esophageal cancer, suggesting its potential clinical utility.

  • Research Article
  • 10.1016/s1470-2045(25)00727-2
Deep learning on histopathological images to predict breast cancer recurrence risk and chemotherapy benefit: a multicentre, model development and validation study.
  • Apr 1, 2026
  • The Lancet. Oncology
  • Gil Shamai + 15 more

Genomic assays such as Oncotype DX have transformed adjuvant treatment selection for hormone receptor-positive, HER2-negative, early breast cancer but remain inaccessible to many patients because of high cost and logistical barriers. We aimed to develop and validate an artificial intelligence (AI) model that estimates Oncotype DX 21-gene recurrence scores directly from routine histopathology slides and clinicopathological variables. In this multicentre, model development and validation study, a multimodal deep-learning model was trained on digital whole-slide images and clinical features using a foundation model pre-trained on 171 189 histopathology slides for predicting Oncotype DX recurrence score. We included slides from patients with hormone receptor-positive, HER2-negative, invasive breast cancers and without scanning artifacts and with at least 100 tissue tiles (1·6 mm2). The model was fine-tuned and validated on the TAILORx randomised trial (8284 patients after quality control). Prognostic and predictive performance was assessed in the TAILORx-test set and externally validated in six independent cohorts (Carmel, Haemek, and Sheba medical centres [Israel], the University of Chicago Medical Center [USA], the Australian Breast Cancer Tissue Bank [Australia], and the Cancer Genome Atlas Breast Invasive Carcinoma project [USA]). In the TAILORx-test set (n=2407), the AI model classified 1097 (45·6%) patients as low risk, 1021 (42·4%) as intermediate risk, and 289 (12·0%) as high risk. For identifying high genomic-risk disease (recurrence score ≥26), the area under the curve (AUC) was 0·898 (95% CI 0·879-0·913). AI-based risk stratification was prognostic for recurrence-free interval (hazard ratio 2·61 [95% CI 1·68-4·04]), distant recurrence-free interval (2·88 [1·73-4·79]), and disease-free survival (1·32 [0·92-1·89]). Chemotherapy benefit was evident in premenopausal patients classified by AI as being at high risk (0·63 [0·46-0·86]) but absent in postmenopausal patients classified by AI as being at low risk (0·94 [0·78-1·12]). 151 (31·3%) clinically high-risk postmenopausal women (by MINDACT criteria) were reclassified as low AI risk with no chemotherapy benefit. Analysis on external cohorts (5497 patients) showed that the model is transferable to new data with high generalisability (recurrence score ≥26 AUC ranging from 0·858 to 0·903). These findings show that AI applied to routine histopathology can serve as a practical and scalable tool for guiding chemotherapy decisions in hormone receptor-positive, HER2-negative, early breast cancer. This approach has the potential to reduce unnecessary chemotherapy and broaden access to precision oncology, particularly in resource-limited settings where genomic testing remains unavailable or unaffordable. Israel Innovation Authority (Kamin), Zimin Institute for Artificial Intelligence Solutions in Healthcare, Israel Precision Medicine Partnership program, and Israel Cancer Research Fund.

  • Research Article
  • 10.1002/mco2.70730
Multimodal Deep Learning for Pulmonary Nodule Detection on Chest Radiography in High-Risk Adults, With Secondary Validation for All-Cause and Cause-Specific Mortality Prediction: A Multicenter Cohort Study.
  • Apr 1, 2026
  • MedComm
  • Junxian Li + 9 more

Chest radiographs (CXRs) may encode prognostic signals beyond pulmonary nodule detection. We developed LungProNet, a multimodal deep-learning (DL) model that fuses CXR features with four epidemiologic variables (age, sex, smoking history, and family history) for pulmonary nodule detection as the primary task, with secondary validation for all-cause and cause-specific mortality prediction. LungProNet was trained and internally validated on Tianjin Lung Cancer Imaging Dataset (TLCID) (70/30; n = 2852/1227) and externally validated on ChestDR (n = 4848), with stratified analyses across epidemiologic strata. Discrimination was quantified by area under the curve (AUC) (95% confidence intervals), with accuracy, sensitivity, and specificity reported, and results were benchmarked against contemporary machine learning/DL baselines. The pretrained multimodal encoder was transferred without fine-tuning to the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) (n = 24,697); its fused embeddings were used as covariates in Cox proportional-hazards models, and time-dependent AUCs were evaluated at 1-12 years. For nodule detection, AUCs were 0.979 (0.975-0.982) in TLCID and 0.849 (0.835-0.862) in ChestDR; the TLCID stratified model reached 0.990 (0.984-0.994). In PLCO, AUCs were 0.925 (0.892-0.952) for all-cause mortality and 0.939-0.985 for cardiac-, lung cancer-, and Chronic Obstructive Pulmonary Disease (COPD)-cause mortality, with robust subgroup performance. These results support CXR-based nodule flagging within screening workflows and suggest secondary opportunistic risk stratification potential.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers