Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Application of multimodal deep learning using radar and water level data for water level prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In general, water level prediction models using deep learning techniques have been developed using time-series water level observation data from upstream water level stations and target water level stations even though many of physical data are necessary to predict water level. The changes of the water level are greatly affected by rainfall in the basin, therefore rainfall information is needed to more accurately predict the water level. In particular, radar data has the advantage of being able to directly acquire the amount of rainfall occurring within a watershed. This study aims to develop the multimodal deep learning model to predict the water level using 2D grid radar rainfall data and 1D time-series water level observation data. This study proposed two multimodal deep learning models which have different structures. Both multimodal deep learning models predict the water level by simultaneously using the observed water level data up to the present time and the radar rainfall data that affects the water level in the future. The first proposed model consists of a deep learning network that links 2D Average Pooling (AvgPool2D), which compresses 2D radar data to 1D data, and Long Short-Term Memory (LSTM), which predicts 1D time series water level data. The second proposed model consists of a deep learning network that predicts water levels by linking Conv2DLSTM and LSTM, which can reflect the characteristics of 2D radar data without deformation.  The two proposed multimodal deep learning models were learned and evaluated in the upper basin of Hantan River. In addition, it was compared with the results of single-modal LSTM using only water level data. There are three water level stations in the study area, and the objective was to predict the water level of the downstream station up to 180 minutes in advance. For learning and verification of the deep learning model, 10-minute water level and radar rainfall data were collected from May 2019 to October 2021. For the radar data used as input, the grid data included in the target watershed were extracted and used among composite radar data with a resolution of 1 km operating by Ministry of Environment. As a result of evaluating each learned deep learning model, two multimodal models had higher prediction accuracy than the single-modal using only water level data. In particular, second proposed model (Conv2dLSTM+LSTM) had better predictive performance than first proposed model (AvgPool2D+LSTM) at the time of the sudden rise in water level due to rainfall.AcknowledgmentsResearch for this paper was carried out under the KICT Research Program (project no. 202200175-001, Development of future-leading technologies solving water crisis against to water disasters affected by climate change) funded by the Ministry of Science and ICT.

Similar Papers
  • Research Article
  • Cite Count Icon 13
  • 10.1007/s00261-024-04202-1
Predicting microvascular invasion in hepatocellular carcinoma with a CT- and MRI-based multimodal deep learning model.
  • Mar 3, 2024
  • Abdominal radiology (New York)
  • Yan Lei + 10 more

To investigate the value of a multimodal deep learning (MDL) model based on computed tomography (CT) and magnetic resonance imaging (MRI) for predicting microvascular invasion (MVI) in hepatocellular carcinoma (HCC). A total of 287 patients with HCC from our institution and 58 patients from another individual institution were included. Among these, 119 patients with only CT data and 116 patients with only MRI data were selected for single-modality deep learning model development, after which select parameters were migrated for MDL model development with transfer learning (TL). In addition, 110 patients with simultaneous CT and MRI data were divided into a training cohort (n = 66) and a validation cohort (n = 44). We input the features extracted from DenseNet121 into an extreme learning machine (ELM) classifier to construct a classification model. The area under the curve (AUC) of the MDL model was 0.844, which was superior to that of the single-phase CT (AUC = 0.706-0.776, P < 0.05), single-sequence MRI (AUC = 0.706-0.717, P < 0.05), single-modality DL model (AUCall-phase CT = 0.722, AUCall-sequence MRI = 0.731; P < 0.05), clinical (AUC = 0.648, P < 0.05), but not to that of the delay phase (DP) and in-phase (IP) MRI and portal venous phase (PVP) CT models. The MDL model achieved better performance than models described above (P < 0.05). When combined with clinical features, the AUC of the MDL model increased from 0.844 to 0.871. A nomogram, combining deep learning signatures (DLS) and clinical indicators for MDL models, demonstrated a greater overall net gain than the MDL models (P < 0.05). The MDL model is a valuable noninvasive technique for preoperatively predicting MVI in HCC.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.jdent.2023.104588
Multi-modal deep learning for automated assembly of periapical radiographs
  • Jun 21, 2023
  • Journal of Dentistry
  • L Pfänder + 5 more

Multi-modal deep learning for automated assembly of periapical radiographs

  • Research Article
  • Cite Count Icon 1
  • 10.59141/jiss.v4i09.883
Analysis of Time Series Water Level Data Prediction Using Deep Learning Method at the Water Gate of DKI Jakarta Water Resources Office
  • Aug 25, 2023
  • Jurnal Indonesia Sosial Sains
  • Supriyade Supriyade + 3 more

Indonesia has 2 seasons, namely the dry season and the rainy season. During the rainy season, many points in the DKI Jakarta area experience flooding or inundation. The reason why Jakarta often experiences flooding is caused by several factors, including local rain floods, shipment floods and tidal floods. The DKI Jakarta Water Resources Agency currently does not have a system that can predict future water levels by referring to past and present water level data. Through this background, the author tries to conduct research in one of the floodgates in the northern area of DKI Jakarta in predicting water levels using deep learning methods , namely Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM). The purpose of this research is to analyze the best deep learning models and predict water level time series data. From the results of the analysis carried out, the best deep learning model is Long Short Term Memory (LSTM) using several tests such as n-input, split data with a composition of 90.33% train data and 9.67% test data , as well as testing of different parameters including epoch, batch size, learning rate, dropout , so the results obtained are the lowest error values with RMSE (17.65), MAPE (0.29), MAE (3.37) and the time needed in the process (runtime) is 39 minutes

  • Research Article
  • Cite Count Icon 3
  • 10.1186/s13058-025-02129-z
Multimodal deep learning model for prediction of breast cancer recurrence risk and correlation with oncotype DX
  • Jan 1, 2025
  • Breast Cancer Research : BCR
  • Ruixin Zhang + 7 more

BackgroundProper stratification of recurrence risk in breast cancer is crucial for guiding treatment decisions. This study aims to predict the recurrence risk of breast cancer patients using a multimodal deep learning model that integrates multiple sequence MRI imaging features with clinicopathologic characteristics.MethodsIn this retrospective study, we enrolled 574 patients with non-metastatic invasive breast cancer from two Chinese institutions between September 2012 and July 2019. We developed a multimodal deep learning (MDL) model by constructing a multi-instance learning framework based on convolutional neural networks. We integrated imaging features from T2WI, DWI, and DCE-MRI sequences with clinicopathologic features for breast cancer recurrence risk stratification. Subsequently, the performance of the MDL model was evaluated using receiver operating characteristic (ROC) curves, the Hosmer–Lemeshow test, calibration curves, and decision curve analysis (DCA). Survival analysis was conducted with Kaplan–Meier survival curves to stratify breast cancer patients into high and low-recurrence risk groups. Time-dependent ROC curves were used to assess 3-year, 5-year, and 7-year recurrence-free survival (RFS) for breast cancer patients. Additionally, we performed differential and enrichment analyses on Oncotype DX genes. We correlated these genes with clinicopathologic features and deep-learning radiographic features using univariate Cox regression and Pearson correlation analysis.ResultsThe MDL model demonstrated good performance in predicting breast cancer recurrence risk and accurately differentiated between high- and low-recurrence risk groups, with an AUC as high as 0.915 (95% CI 0.8448–0.9856). The C-index of prediction models was 0.803 in the testing cohort. The AUCs for 5-year and 7-year RFS were 0.936 (95% CI 0.876–0.997) and 0.956 (95% CI 0.902–1.000) in the validation cohort. In the testing cohort, these AUCs were 0.836 (95% CI 0.763–0.909) and 0.783 (95% CI 0.676–0.891). This study found a significant correlation between Oncotype DX gene expression, clinicopathologic features, and deep-learning radiographic features (p < 0.05).ConclusionsThis study validated the robust predictive accuracy of the MDL model in identifying high- and low-risk groups for recurrence. The correlations identified between Oncotype DX genes, clinicopathologic features, and deep-learning radiographic features offer novel insights for future biomarker research in breast cancer.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13058-025-02129-z.

  • Research Article
  • Cite Count Icon 7
  • 10.1007/s00259-024-07065-2
PSMA PET/CT based multimodal deep learning model for accurate prediction of pelvic lymph-node metastases in prostate cancer patients identified as candidates for extended pelvic lymph node dissection by preoperative nomograms.
  • Jan 27, 2025
  • European journal of nuclear medicine and molecular imaging
  • Qiaoke Ma + 10 more

To develop and validate a prostate-specific membrane antigen (PSMA) PET/CT based multimodal deep learning model for predicting pathological lymph node invasion (LNI) in prostate cancer (PCa) patients identified as candidates for extended pelvic lymph node dissection (ePLND) by preoperative nomograms. [68Ga]Ga-PSMA-617 PET/CT scan of 116 eligible PCa patients (82 in the training cohort and 34 in the test cohort) who underwent radical prostatectomy with ePLND were analyzed in our study. The Med3D deep learning network was utilized to extract discriminative features from the entire prostate volume of interest on the PET/CT images. Subsequently, a multimodal model i.e., Multi kernel Support Vector Machine was constructed to combine the PET/CT deep learning features, quantitative PET and clinical parameters. The performance of the multimodal models was assessed using final histopathology as the reference standard, with evaluation metrics including area under the receiver operating characteristic curve (AUC), calibration curve, decision curve analysis, and compared with available nomograms and PET/CT visual evaluation result. Our multimodal model incorporated clinical information, maximum standardized uptake value (SUVmax), and PET/CT deep learning features. The AUC for predicting LNI was 0.89 (95% confidence interval [CI] 0.81-0.97) for the final model. The proposed model demonstrated superior predictive accuracy in the test cohort compared to PET/CT visual evaluation result, the Memorial Sloan Kettering Cancer Center (MSKCC) and the Briganti-2017 nomograms (AUC 0.85 [95% CI 0.69-1.00] vs. 0.80 [95% CI 0.64-0.95] vs. 0.79 [95% CI 0.61-0.97] and 0.69 [95% CI 0.50-0.88], respectively). The proposed model showed similar calibration and higher net benefit as compared to the traditional nomograms. Our multimodal deep learning model, which incorporates preoperative PSMA PET/CT imaging, shows enhanced predictive capabilities for LNI in clinically localized PCa compared to PSMA PET/CT visual evaluation result and existing nomograms like the MSKCC and Briganti-2017 nomograms. This model has the potential to reduce unnecessary ePLND procedures while minimizing the risk of missing cases of LNI.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1186/s12911-021-01700-w
Prediction of central venous catheter-associated deep venous thrombosis in pediatric critical care settings
  • Nov 27, 2021
  • BMC Medical Informatics and Decision Making
  • Haomin Li + 6 more

BackgroundAn increase in the incidence of central venous catheter (CVC)-associated deep venous thrombosis (CADVT) has been reported in pediatric patients over the past decade. At the same time, current screening guidelines for venous thromboembolism risk have low sensitivity for CADVT in hospitalized children. This study utilized a multimodal deep learning model to predict CADVT before it occurs.MethodsChildren who were admitted to intensive care units (ICUs) between December 2015 and December 2018 and with CVC placement at least 3 days were included. The variables analyzed included demographic characteristics, clinical conditions, laboratory test results, vital signs and medications. A multimodal deep learning (MMDL) model that can handle temporal data using long short-term memory (LSTM) and gated recurrent units (GRUs) was proposed for this prediction task. Four benchmark machine learning models, logistic regression (LR), random forest (RF), gradient boosting decision tree (GBDT) and a published cutting edge MMDL, were used to compare and evaluate the models with a fivefold cross-validation approach. Accuracy, recall, area under the ROC curve (AUC), and average precision (AP) were used to evaluate the discrimination of each model at three time points (24 h, 48 h and 72 h) before CADVT occurred. Brier score and Spiegelhalter’s z test were used measure the calibration of these prediction models.ResultsA total of 1830 patients were included in this study, and approximately 15% developed CADVT. In the CADVT prediction task, the model proposed in this paper significantly outperforms both traditional machine learning models and existing multimodal deep learning models at all 3 time points. It achieved 77% accuracy and 90% recall at 24 h before CADVT was discovered. It can be used to accurately predict the occurrence of CADVT 72 h in advance with an accuracy of greater than 75%, a recall of more than 87%, and an AUC value of 0.82.ConclusionIn this study, a machine learning method was successfully established to predict CADVT in advance. These findings demonstrate that artificial intelligence (AI) could provide measures for thromboprophylaxis in a pediatric intensive care setting.

  • Research Article
  • Cite Count Icon 21
  • 10.1164/rccm.202304-0767oc
Predicting Obstructive Sleep Apnea Based on Computed Tomography Scans Using Deep Learning Models.
  • Jul 15, 2024
  • American journal of respiratory and critical care medicine
  • Jeong-Whun Kim + 7 more

Rationale: The incidence of clinically undiagnosed obstructive sleep apnea (OSA) is high among the general population because of limited access to polysomnography. Computed tomography (CT) of craniofacial regions obtained for other purposes can be beneficial in predicting OSA and its severity. Objectives: To predict OSA and its severity based on paranasal CT using a three-dimensional deep learning algorithm. Methods: One internal dataset (N = 798) and two external datasets (N = 135 and N = 85) were used in this study. In the internal dataset, 92 normal participants and 159 with mild, 201 with moderate, and 346 with severe OSA were enrolled to derive the deep learning model. A multimodal deep learning model was elicited from the connection between a three-dimensional convolutional neural network-based part treating unstructured data (CT images) and a multilayer perceptron-based part treating structured data (age, sex, and body mass index) to predict OSA and its severity. Measurements and Main Results: In a four-class classification for predicting the severity of OSA, the AirwayNet-MM-H model (multimodal model with airway-highlighting preprocessing algorithm) showed an average accuracy of 87.6% (95% confidence interval [CI], 86.8-88.6%) in the internal dataset and 84.0% (95% CI, 83.0-85.1%) and 86.3% (95% CI, 85.3-87.3%) in the two external datasets, respectively. In the two-class classification for predicting significant OSA (moderate to severe OSA), the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, and F1 score were 0.910 (95% CI, 0.899-0.922), 91.0% (95% CI, 90.1-91.9%), 89.9% (95% CI, 88.8-90.9%), 93.5% (95% CI, 92.7-94.3%), and 93.2% (95% CI, 92.5-93.9%), respectively, in the internal dataset. Furthermore, the diagnostic performance of the Airway Net-MM-H model outperformed that of the other six state-of-the-art deep learning models in terms of accuracy for both four- and two-class classifications and area under the receiver operating characteristic curve for two-class classification (P < 0.001). Conclusions: A novel deep learning model, including a multimodal deep learning model and an airway-highlighting preprocessing algorithm from CT images obtained for other purposes, can provide significantly precise outcomes for OSA diagnosis.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.3390/app122010405
Improving Air Pollution Prediction System through Multimodal Deep Learning Model Optimization
  • Oct 15, 2022
  • Applied Sciences
  • Kyung-Kyu Ko + 1 more

Many forms of air pollution increase as science and technology rapidly advance. In particular, fine dust harms the human body, causing or worsening heart and lung-related diseases. In this study, the level of fine dust in Seoul after 8 h is predicted to prevent health damage in advance. We construct a dataset by combining two modalities (i.e., numerical and image data) for accurate prediction. In addition, we propose a multimodal deep learning model combining a Long Short Term Memory (LSTM) and Convolutional Neural Network (CNN). An LSTM AutoEncoder is chosen as a model for numerical time series data processing and basic CNN. A Visual Geometry Group Neural Network (VGGNet) (VGG16, VGG19) is also chosen as a CNN model for image processing to compare performance differences according to network depth. The VGGNet is a standard deep CNN architecture with multiple layers. Our multimodal deep learning model using two modalities (i.e., numerical and image data) showed better performance than a single deep learning model using only one modality (numerical data). Specifically, the performance improved up to 14.16% when the VGG19 model, which has a deeper network, was used rather than the VGG16 model.

  • Research Article
  • Cite Count Icon 12
  • 10.1109/jbhi.2025.3529348
WavFace: A Multimodal Transformer-Based Model for Depression Screening.
  • May 1, 2025
  • IEEE journal of biomedical and health informatics
  • Ricardo Flores + 3 more

Depression, a prevalent mental health disorder with severe health and economic consequences, can be costly and difficult to detect. To alleviate this burden, recent research has been exploring the depression screening capabilities of deep learning (DL) models trained on videos of clinical interviews conducted by a virtual agent. Such DL models need to consider the challenges of modality representation, alignment, and fusion as well as small sample sizes. To address them, we propose WavFace, a multimodal deep learning model that inputs audio and temporal facial features. WavFace adds an encoder-transformer layer over pre-trained models to improve the unimodal representation. It also applies an explicit alignment method for both modalities and then uses sequential and spatial self-attention over the alignment. Finally, WavFace fuses the sequential and spatial self-attentions among the two modality embeddings, inspired by how mental health professionals simultaneously observe visual and vocal presentation during clinical interviews. By leveraging sequential and spatial self-attention, WavFace outperforms pre-trained unimodal and multimodal models from the literature. With a single interview question, WaveFace screened for depression with a balanced accuracy of 0.81. This presents a valuable modeling approach for audio-visual mental health screening.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3390/w15183190
Exploring the Effect of Meteorological Factors on Predicting Hourly Water Levels Based on CEEMDAN and LSTM
  • Sep 7, 2023
  • Water
  • Zihuang Yan + 2 more

The magnitude of tidal energy depends on changes in ocean water levels, and by accurately predicting water level changes, tidal power plants can be effectively helped to plan and optimize the timing of power generation to maximize energy harvesting efficiency. The time-dependent nature of water level changes results in water level data being of the time-series type and is essential for both short- and long-term forecasting. Real-time water level information is essential for studying tidal power, and the National Oceanic and Atmospheric Administration (NOAA) has real-time water level information, making the NOAA data useful for such studies. In this paper, long short-term memory (LSTM) and its variants, stack long short-term memory (StackLSTM) and bi-directional long short-term memory (BiLSTM), are used to predict water levels at three sites and compared with classical machine learning algorithms, e.g., support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM). This study aims to investigate the effects of wind speed (WS), wind direction (WD), gusts (WG), air temperature (AT), and atmospheric pressure (Baro) on predicting hourly water levels (WL). The results show that the highest coefficient of determination (R2) was obtained at all meteorological factors when used as inputs, except at the La Jolla site. (Burlington station (R2) = 0.721, Kahului station (R2) = 0.852). In the final part of this article, the complete ensemble empirical mode decomposition adaptive noise (CEEMDAN) algorithm was introduced into various models, and the results showed a significant improvement in predicting water levels at each site. Among them, the CEEMDAN-BiLSTM algorithm performed the best, with an average RMSE of 0.0759 mh−1 for the prediction of three sites. This indicates that applying the CEEMDAN algorithm to deep learning has a more stable predictive performance for water level forecasting in different regions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.1007/s00330-022-09031-8
Multimodal deep learning model on interim [18F]FDG PET/CT for predicting primary treatment failure in diffuse large B-cell lymphoma.
  • Aug 27, 2022
  • European Radiology
  • Cheng Yuan + 7 more

The prediction of primary treatment failure (PTF) is necessary for patients with diffuse large B-cell lymphoma (DLBCL) since it serves as a prominent means for improving front-line outcomes. Using interim 18F-fluoro-2-deoxyglucose ([18F]FDG) positron emission tomography/computed tomography (PET/CT) imaging data, we aimed to construct multimodal deep learning (MDL) models to predict possible PTF in low-risk DLBCL. Initially, 205 DLBCL patients undergoing interim [18F]FDG PET/CT scans and the front-line standard of care were included in the primary dataset for model development. Then, 44 other patients were included in the external dataset for generalization evaluation. Based on the powerful backbone of the Conv-LSTM network, we incorporated five different multimodal fusion strategies (pixel intermixing, separate channel, separate branch, quantitative weighting, and hybrid learning) to make full use of PET/CT features and built five corresponding MDL models. Moreover, we found the best model, that is, the hybrid learning model, and optimized it by integrating the contrastive training objective to further improve its prediction performance. The final model with contrastive objective optimization, named the contrastive hybrid learning model, performed best, with an accuracy of 91.22% and an area under the receiver operating characteristic curve (AUC) of 0.926, in the primary dataset. In the external dataset, its accuracy and AUC remained at 88.64% and 0.925, respectively, indicating its good generalization ability. The proposed model achieved good performance, validated the predictive value of interim PET/CT, and holds promise for directing individualized clinical treatment. • The proposed multimodal models achieved accurate prediction of primary treatment failure in DLBCL patients. • Using an appropriate feature-level fusion strategy can make the same class close to each other regardless of the modal heterogeneity of the data source domain and positively impact the prediction performance. • Deep learning validated the predictive value of interim PET/CT in a way that exceeded human capabilities.

  • Research Article
  • Cite Count Icon 5
  • 10.2215/cjn.0000000695
Multicenter Development and Validation of a Multimodal Deep Learning Model to Predict Moderate to Severe AKI.
  • Apr 15, 2025
  • Clinical journal of the American Society of Nephrology : CJASN
  • Jay L Koyner + 8 more

Prior models for the early identification of acute kidney injury (AKI) have utilized structured data (e.g., vital signs and laboratory values). We aimed to develop and validate a deep learning model to predict moderate to severe AKI by combining structured data and information from unstructured notes. Adults (≥18 years) admitted to the University of Wisconsin (2009-20) and the University of Chicago Medicine (2016-22) were eligible for inclusion. Patients were excluded if they had no documented serum creatinine (SCr), end-stage kidney disease, an admission SCr≥3.0mg/dL, developed ≥Stage 2 AKI before reaching the wards or intensive care unit (ICU), or required dialysis (KRT) within the first 48 hours. Text from unstructured notes was mapped to standardized Concept Unique Identifiers (CUIs) to create predictor variables, and structured data variables were also included. An intermediate fusion deep learning recurrent neural network architecture was used to predict ≥Stage 2 AKI within the next 48 hours. This multimodal model was developed in the first 80% of the data and temporally validated in the next 20%. There were 339,998 admissions in the derivation cohort and 84,581 in the validation cohort, with 12,748 (3%) developing ≥Stage 2 AKI. Patients with ≥Stage 2 AKI were older, more likely to be male, had higher baseline SCr, and were more commonly in the ICU (p<0.001 for all). The multimodal model outperformed a model based only on structured data for all outcomes, with an area under the receiver operating characteristic curve (95% CI) of 0.88(0.88-0.88) for predicting ≥Stage 2 AKI and 0.93(0.93-0.94) for receiving KRT. The area under the precision-recall-curve for ≥Stage 2 AKI was 0.20. Results were similar during external validation. We developed and validated a multimodal deep learning model using structured and unstructured data that predicts the development of severe AKI across the hospital stay for earlier intervention.

  • Research Article
  • 10.3389/fcvm.2026.1771669
A multimodal deep learning model for predicting impending rupture in symptomatic abdominal aortic aneurysms using CTA and clinical data.
  • Jan 1, 2026
  • Frontiers in cardiovascular medicine
  • Jiaxin Cheng + 9 more

In hemodynamically stable patients with symptomatic abdominal aortic aneurysms (AAA), timely diagnosis of impending rupture remains a critical challenge. To address this, we developed and validated an interpretable multimodal deep learning model to assess rupture risk and support emergency decision-making. This retrospective cohort study included 263 symptomatic AAA patients, with the most recent year's cases (n = 33) as an independent temporal test set. In the 230-patient development cohort, 75 impending rupture cases were matched 1:1 with 75 stable controls using propensity score for age, sex, and maximum aortic diameter. We developed a multimodal deep learning model that combines sequential CTA slices with six key clinical biomarkers through a bidirectional cross-attention (BCA) mechanism built on a ResNet-50 image encoder. For interpretability, we used Gradient-weighted Class Activation Mapping (Grad-CAM) and conducted pre-specified sensitivity analyses assessing robustness against endpoint decision-dependence, treatment-related data leakage, and domain shifts. In the matched development test set (n = 30), our multimodal model achieved an area under the curve (AUC) of 0.898 with sensitivity and negative predictive value (NPV) both at 93.3%, offering a high safety margin for ruling out rupture. It markedly outperformed two pragmatic clinical baselines (clinical-rule model AUC: 0.751; CTA-sign model 0.778). This strong performance persisted in the independent temporal validation cohort (n = 33), where it attained an AUC of 0.880, sensitivity of 92.9%, and NPV of 87.5%. The proposed BCA fusion outperformed alternative architectures, and Grad-CAM visualizations were anatomically plausible in 78.8% of cases, supporting model interpretability. We developed and temporally validated an interpretable multimodal model that integrates CTA and clinical biomarkers to enable rapid AAA rupture risk stratification, offering a clinically relevant improvement in the safety and efficiency of emergency triage over current practice, pending prospective validation.

  • Abstract
  • 10.1002/alz70856_099536
Development of a deep learning model using multimodal data for dementia diagnosis
  • Dec 1, 2025
  • Alzheimer's & Dementia
  • Hee Won Yang + 2 more

BackgroundThis study aims to develop a multimodal deep learning model that integrates voice and drawing data collected during dementia screening tests to improve the accuracy of dementia diagnosis. The study also evaluates the impact of different data modalities on classification performance.Method1,091 participants (normal cognition, mild cognitive impairment, dementia) were included from five university hospitals located in different regions of South Korea. Voice responses were converted into Mel Frequency Cepstral Coefficient (MFCC) spectrograms, and pentagon drawings were preprocessed into grayscale images. DenseNet was used for feature extraction from voice and drawing data, while demographic and clinical data were analyzed using a multilayer perceptron (MLP). The final multimodal model combined these modalities using weighted ensemble learning.ResultThe multimodal model achieved an accuracy of 66.3% (95% CI: 61.9–70.7) and an AUC of 0.73 (95% CI: 0.70–0.76) for three‐group classification (normal, MCI, dementia). For binary classification (normal vs. dementia), the model achieved an accuracy of 86.9% (95% CI: 84.3–89.4) and an AUC of 0.86 (95% CI: 0.84–0.88). Voice data alone showed strong diagnostic performance, with comparable accuracy and AUC to multimodal models. Drawing data improved performance in multi‐class classification but had limited impact in binary tasks. Clinical data, including MMSE scores and demographic information, provided modest additional contributions to overall model performance.ConclusionThe multimodal deep learning model combining voice, drawing, and clinical data demonstrated promising performance in diagnosing cognitive impairment and dementia. These findings suggest that integrating diverse data modalities can enhance diagnostic accuracy and provide a scalable approach for early detection in clinical and real‐world settings.

  • Research Article
  • 10.1186/s12877-026-07005-9
The sarcopenia artificial intelligence diagnostic decision support system (SAID DSS) – a multimodal deep learning model
  • Jan 26, 2026
  • BMC Geriatrics
  • Kristoffer Kittelmann Brockhattingen + 7 more

Early detection and treatment of sarcopenia are crucial for improving patient outcomes, yet current diagnostic methods often lack the accuracy, accessibility, and efficiency needed for widespread clinical use. The aim of this study was to develop an accurate, secure, and evidence-based multimodal AI model using a point-of-care ultrasound (POCUS) framework combining muscle imaging properties with physical performance for sarcopenia diagnosis. The model uses clinical data and POCUS images. Clinical data consisted of age, gender, height, weight, body mass index (BMI) and data on physical performance by Short Physical Performance Battery (SPPB) scores. SPPB scores were chosen since it is recommended by both the European Working Group of Sarcopenia in Older People 2 and the Asian Working Group for Sarcopenia. POCUS data consisted of images from the dominant thigh, focusing on the rectus femoris muscle, using longitudinal and transverse projections. Various Machine Learning (ML) and Deep Learning (DL) algorithms and multimodal architectures were tested. Explainable AI (XAI) methods, including Grad-CAM for ultrasound images and feature-attribution analysis for clinical variables, were integrated to provide transparent interpretation of the multimodal model’s diagnostic decisions. The final model was implemented as part of the Sarcopenia Artificial Intelligence Diagnostic Decision Support System (SAID DSS). Participants (24) were mostly women (63%) with a mean age of 81 years (± 5.2), (age range: 71–91 years) a mean body mass index of 26 kg/m2 (± 4.1), and mean SPPB scores of 5 (± 1.6) and 9 (± 1.6) for sarcopenic and controls. 1060 and 2414 longitudinal and transverse ultrasound events for sarcopenic and control participants, respectively, were used, demonstrating a robust dataset despite the small number of participants. Comprehensive experimental results showed that a feature-level fusion technique using a multilayer perceptron network as classifier and Xception architectures for image feature extraction demonstrated the best performance. The final model yielded a diagnostic accuracy of 85%, an F1-score of 0.85 and an area under the curve (AUC) of 0.84, higher than previous models. This study is the first to introduce a clinically oriented, AI-based multimodal model for sarcopenia detection, demonstrating improved performance over existing approaches. In addition, we provided an explanation of the decisions generated by the best-performing detection model. By integrating this model into SAID DSS, we provide a practical and scalable tool with potential for direct application in clinical workflows, supporting early and accurate identification of sarcopenia. Not applicable.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant