Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A Timeseries-based Multimodal Deep Learning Approach for Lung Nodule Growth Prediction.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Lung nodules, while often benign, can become significant health concerns if their growth is not monitored accurately. Predicting lung nodule growth is critical for improving patient outcomes and guiding clinical decision-making. This study aims to develop a Multimodal Deep Learning Approach to enhance the accuracy of lung nodule growth prediction by integrating time-series CT image data with demographics and nodule-specific features. Data were collected from the Far Eastern Memorial Hospital, Taiwan, including CT image sequences of lung nodules and patient demographics and nodule-specific features. Using this dataset, a Multimodal Deep Learning framework was developed and optimized. The model's performance was assessed using metrics such as Accuracy, Precision, Sensitivity, F1-score, and AUC. The proposed Multimodal Deep Learning framework substantially outperformed traditional machine learning and unimodal models. Among all configurations, the repeat frame strategy achieved the best overall performance, with an accuracy of 0.929, precision of 0.878, sensitivity of 0.908, F1-score of 0.878, and AUC of 0.977. Paired t-test analysis confirmed that these improvements were statistically significant (p < 0.05) compared to other multimodal variants and baseline models. These results highlight the model's ability to effectively integrate image, demographics, and nodule-specific features, leading to superior predictive accuracy and robust clinical decision-support potential. By using the time-series of CT image data, along with demographics and nodule-specific features, the proposed Multimodal Deep Learning provides a reliable tool for predicting lung nodule growth. This advancement has significant implications for lung nodule management, offering clinicians a robust and dependable resource to support medical decision-making and improve patient care. The findings highlight the transformative potential of deep learning techniques in critical healthcare domains.

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1108/jhtt-04-2023-0098
Sarcasm detection in hotel reviews: a multimodal deep learning approach
  • May 27, 2024
  • Journal of Hospitality and Tourism Technology
  • Yang Liu + 2 more

酒店评论中的讽刺检测:一种多模态深度学习方法研究目的本研究通过分析酒店评论文本和图像之间情感特征的不一致性来检测消费者的讽刺。研究方法本文提出了一种基于多模态深度学习的讽刺检测模型, 使用从两个旅行平台收集的三个酒店品牌的评论, 该模型能够识别模态内部和模态之间的情感不一致性。利用图神经网络(GNN)探索文本-图像交互信息, 以检测讽刺情感中的关键线索。研究发现研究结果显示, 多模态深度学习模型优于其他基线模型, 这有助于理解酒店服务评估, 并为酒店经理提供决策建议。研究创新该研究可以在两方面帮助酒店业者:检测服务质量和制定策略。通过选择参考酒店品牌, 酒店业者可以更好地评估其服务质量水平(随之而来的是最佳资源分配), 因此, 讽刺检测研究不仅有助于寻求提高服务质量的酒店经理。本研究介绍的多模态深度学习方法可以在其他行业复制, 帮助旅行平台优化其产品和服务。

  • Research Article
  • 10.3389/fpubh.2025.1687335
MDL-CA: a multimodal deep learning approach with a cross attention mechanism for accurate brain cancer diagnosis
  • Jan 5, 2026
  • Frontiers in Public Health
  • Sumaira Sarwar + 4 more

IntroductionBrain cancer diagnosis poses a significant clinical challenge due to the complex interplay between molecular mechanisms and anatomical abnormalities. Traditional diagnostic techniques, including invasive biopsies, isolated genomic assays, and standalone Magnetic Resonance Imaging (MRI), often exhibit limitations such as procedural risks, inadequate sensitivity, and incomplete assessment of tumor heterogeneity. These shortcomings contribute to delayed diagnosis, inaccurate tumor grading, and suboptimal treatment planning. Furthermore, single-modality data, whether MRI or genomic profiles, frequently yield limited diagnostic accuracy and biological interpretability.MethodsTo address these limitations, this study proposes MDL-CA, a Multimodal Deep Learning framework with a Cross-Attention mechanism, designed to integrate genomic and MRI modalities for enhanced brain cancer diagnosis. The framework fuses genomic graph embeddings, extracted using a Graph Attention Network (GAT), with MRI feature maps derived from a 3D DenseNet. The cross-modal attention fusion mechanism enables the model to capture intricate biological and spatial interactions, producing a biologically informed feature representation. Additionally, the Entmax sigmoid function is employed in the classification stage to promote sparsity and improve interpretability. Data were sourced from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA) following comprehensive preprocessing.ResultsExtensive experiments conducted across four benchmark datasets demonstrated that MDL-CA achieved superior diagnostic performance, with accuracies of 96.22%, 97.14%, 98.46%, and 98.21%, and F1-scores ranging from 95.95% to 98.40%. These results confirm the framework’s robustness, scalability, and consistent generalization across diverse datasets.DiscussionThe integration of genomic and MRI data through the proposed cross-attention mechanism enables deeper biological understanding and improved diagnostic precision compared to single-modality and conventional fusion approaches. By effectively modeling interactions between molecular and anatomical features, MDL-CA advances the development of biologically informed, multimodal diagnostic systems for brain cancer. The results highlight the framework’s potential to support early diagnosis and personalized treatment planning in clinical practice.

  • Research Article
  • Cite Count Icon 2
  • 10.1038/s41598-025-96052-0
An enhanced CNN-Bi-transformer based framework for detection of neurological illnesses through neurocardiac data fusion
  • Apr 3, 2025
  • Scientific Reports
  • Kavita Rawat + 1 more

Classical approaches to diagnosis frequently rely on self-reported symptoms or clinician observations, which can make it difficult to examine mental health illnesses due to their subjective and complicated nature. In this work, we offer an innovative methodology for predicting mental illnesses such as epilepsy, sleep disorders, bipolar disorder, eating disorders, and depression using a multimodal deep learning framework that integrates neurocardiac data fusion. The proposed framework combines MEG, EEG, and ECG signals to create a more comprehensive understanding of brain and cardiac function in individuals with mental disorders. The multimodal deep learning approach uses an integrated CNN-Bi-Transformer, i.e., CardioNeuroFusionNet, which can process multiple types of inputs simultaneously, allowing for the fusion of various modalities and improving the performance of the predictive representation. The proposed framework has undergone testing on data from the Deep BCI Scalp Database and was further validated on the Kymata Atlas dataset to assess its generalizability. The model achieved promising results with high accuracy (98.54%) and sensitivity (97.77%) in predicting mental problems, including neurological and psychiatric conditions. The neurocardiac data fusion has been found to provide additional insights into the relationship between brain and cardiac function in neurological conditions, which could potentially lead to more accurate diagnosis and personalized treatment options. The suggested method overcomes the shortcomings of earlier studies, which tended to concentrate on single-modality data, lacked thorough neurocardiac data fusion, and made use of less advanced machine learning algorithms. The comprehensive experimental findings, which provide an average improvement in accuracy of 2.72%, demonstrate that the suggested work performs better than other cutting-edge AI techniques and generalizes effectively across diverse datasets.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.jdent.2023.104588
Multi-modal deep learning for automated assembly of periapical radiographs
  • Jun 21, 2023
  • Journal of Dentistry
  • L Pfänder + 5 more

Multi-modal deep learning for automated assembly of periapical radiographs

  • Research Article
  • Cite Count Icon 1
  • 10.1051/0004-6361/202553751
Estimation of age and metallicity for galaxies based on multi-modal deep learning
  • Jun 1, 2025
  • Astronomy &amp; Astrophysics
  • Ping Li + 4 more

Aims. This study is aimed at deriving the age and metallicity of galaxies by proposing a novel multi-modal deep learning framework. This multi-modal framework integrates spectral and photometric data, offering advantages in cases where spectra are incomplete or unavailable. Methods. We propose a multi-modal learning method for estimating the age and metallicity of galaxies (MMLforGalAM). This method uses two modalities: spectra and photometric images as training samples. Its architecture consists of four models: a spectral feature extraction model (ℳ1), a simulated spectral feature generation model (ℳ2), an image feature extraction model (ℳ3), and a multi-modal attention regression model (ℳ4). Specifically, ℳ1 extracts spectral features associated with age and metallicity from spectra observed by the Sloan Digital Sky Survey (SDSS). These features are then used as labels to train ℳ2, which generates simulated spectral features for photometric images to address the challenge of missing observed spectra for some images. Overall, ℳ1 and ℳ2 provide a transformation from photometric to spectral features, with the goal of constructing a spectral representation of data pairs (photometric and spectral features) for multi-modal learning. Once ℳ2 is trained, MMLforGalAM can then be applied to scenarios with only images, even in the absence of spectra. Then, ℳ3 processes SDSS photometric images to extract features related to age and metallicity. Finally, ℳ4 combines the simulated spectral features from ℳ2 with the extracted image features from ℳ3 to predict the age and metallicity of galaxies. Results. Trained on 36278 galaxies from SDSS, our model predicts the stellar age and metallicity, with a scatter of 1σ = 0.1506 dex for age and 1 σ = 0.1402 dex for metallicity. Compared to a single-modal model trained using only images, the multi-modal approach reduces the scatter by 27% for age and 15% for metallicity.

  • Research Article
  • Cite Count Icon 7
  • 10.1038/s41598-025-10512-1
A multimodal deep reinforcement learning approach for IoT-driven adaptive scheduling and robustness optimization in global logistics networks
  • Jul 12, 2025
  • Scientific Reports
  • Yao Lu

This paper presents an approach for adaptive scheduling and robustness optimization in global logistics networks by integrating multimodal deep reinforcement learning with Internet of Things (IoT) technologies. We propose an integrated framework comprising a multimodal data fusion mechanism that synthesizes heterogeneous IoT sensor data, historical records, and contextual information; an adaptive deep reinforcement learning architecture that generates dynamic scheduling policies; and a multi-objective robust optimization method that balances operational efficiency with system resilience. The framework addresses key challenges in global logistics including demand volatility, transportation disruptions, and environmental uncertainties. Comprehensive experiments conducted on real-world logistics datasets demonstrate that our approach outperforms traditional methods with an 18.7% reduction in operational costs, 12.4% improvement in service levels, and significantly enhanced robustness under various disruption scenarios. The proposed method maintains 83% performance stability during complex disruptions compared to 51–72% for alternative approaches, while keeping computational requirements feasible for practical deployment. This research demonstrates potential contributions to AI-driven logistics operations management by showing improved supply chain performance through multimodal learning and robust optimization techniques.

  • Research Article
  • Cite Count Icon 1299
  • 10.1109/tgrs.2020.3016820
More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification
  • Aug 16, 2020
  • IEEE Transactions on Geoscience and Remote Sensing
  • Danfeng Hong + 6 more

Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on "what", "where", and "how" to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 33
  • 10.1186/s12859-019-3084-y
Multimodal deep representation learning for protein interaction identification and protein family classification
  • Dec 1, 2019
  • BMC Bioinformatics
  • Da Zhang + 1 more

BackgroundProtein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge.ResultsIn this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods.ConclusionTo the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.

  • Research Article
  • 10.55041/ijsrem59205
Detecting Deception: A Multimodal Deep Learning Approach for Fake News Identification Using Text and Social Signals
  • Apr 5, 2026
  • INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Rasamsetti Naresh + 4 more

Abstract—The spread of fake news on social media is a big threat to public discourse, democracy, and trust in society. Conventional unimodal methodologies that depend exclusively on textual content have demonstrated inadequacy in encapsulating the intricate dynamics of misinformation dissemination. This paper shows a full multimodal deep learning framework that combines text with social signals to help find fake news more easily. We use the latest transformer architectures to encode text, graph neural networks to model how social information spreads, and adaptive fusion mechanisms to combine content features with social context. The proposed methodology addresses significant deficiencies in the current literature, specifically the insufficient acquisition of structural social information and the discordance between content and social modalities. By systematically an- alyzing recent studies, we show that multimodal approaches always do better than unimodal baselines. For example, on benchmark datasets, the accuracies were 94.3% and the F1-scores were 92.8%. This work integrates contemporary methodological trends, delineates enduring research deficiencies, and introduces an innovative framework that enhances the forefront of auto- mated fake news detection by adeptly modeling the interaction between content semantics and social propagation dynamics. Keywords: Fake news detection, multimodal deep learning, social signals, graph neural networks, transformer models, mis- information detection Index Terms—component, formatting, style, styling, insert

  • Research Article
  • Cite Count Icon 2
  • 10.1002/hbm.70396
A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion MRI Tractography
  • Oct 31, 2025
  • Human Brain Mapping
  • Yui Lo + 10 more

ABSTRACTRecently, shape measures have emerged as promising descriptors of white matter tractography, offering complementary insights into anatomical variability and associations with cognitive and clinical phenotypes. However, conventional methods for computing shape measures are computationally expensive and time‐consuming for large‐scale datasets due to reliance on voxel‐based representations. To address these limitations, we introduce Tract2Shape, a novel multimodal deep learning framework that integrates geometric streamline features (as point clouds) with scalar data descriptors (as tabular data) from tractography to predict 10 white matter tractography shape measures. We propose a Siamese architecture in which each subnetwork incorporates a dual‐encoder design, enabling each encoder to learn modality‐specific representations. To enhance model efficiency, we utilize a dimensionality reduction algorithm for the model to predict five primary shape components. The model is trained and evaluated on two independently acquired datasets: the Human Connectome Project minimally preprocessed young adults (HCP‐YA) dataset and the Parkinson's Progression Markers Initiative (PPMI) dataset. Tract2Shape is trained and tested on the HCP‐YA dataset, with performance compared against state‐of‐the‐art models. To assess robustness and generalization, we further evaluate the model on the unseen PPMI dataset. Tract2Shape outperforms state‐of‐the‐art deep learning models across all 10 shape measures, achieving the highest average Pearson's r and the lowest normalized mean squared error (nMSE) on the HCP‐YA dataset. The ablation study shows that both multimodal input and PCA benefit performance. On the unseen testing PPMI dataset, Tract2Shape maintains a high Pearson's r and low nMSE, demonstrating strong generalizability in cross‐dataset evaluation. In comparison with traditional voxel‐representation‐based shape computation, Tract2Shape achieves a 99.2% improvement in efficiency (< 0.1 s per subject). Tract2Shape enables fast, accurate, and generalizable prediction of white matter shape measures from tractography data, supporting scalable analysis across datasets. This framework lays a promising foundation for future large‐scale white matter shape analysis.

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.media.2022.102465
A multimodal deep learning model for cardiac resynchronisation therapy response prediction.
  • Jul 1, 2022
  • Medical image analysis
  • Esther Puyol-Antón + 7 more

We present a novel multimodal deep learning framework for cardiac resynchronisation therapy (CRT) response prediction from 2D echocardiography and cardiac magnetic resonance (CMR) data. The proposed method first uses the 'nnU-Net' segmentation model to extract segmentations of the heart over the full cardiac cycle from the two modalities. Next, a multimodal deep learning classifier is used for CRT response prediction, which combines the latent spaces of the segmentation models of the two modalities. At test time, this framework can be used with 2D echocardiography data only, whilst taking advantage of the implicit relationship between CMR and echocardiography features learnt from the model. We evaluate our pipeline on a cohort of 50 CRT patients for whom paired echocardiography/CMR data were available, and results show that the proposed multimodal classifier results in a statistically significant improvement in accuracy compared to the baseline approach that uses only 2D echocardiography data. The combination of multimodal data enables CRT response to be predicted with 77.38% accuracy (83.33% sensitivity and 71.43% specificity), which is comparable with the current state-of-the-art in machine learning-based CRT response prediction. Our work represents the first multimodal deep learning approach for CRT response prediction.

  • Research Article
  • 10.1016/j.synbio.2026.02.003
Functional customization of peptide linkers in fusion proteins through multimodal deep learning approach
  • Mar 10, 2026
  • Synthetic and Systems Biotechnology
  • Zhong Li + 7 more

Peptide linkers are critical modulators of function in fusion proteins, a foundational technology in modern bioengineering. However, the rational customization of linkers for specific applications remains challenging, hindered by an insufficient understanding of the relationship between linker sequences and fused protein function. In this study, we systematically characterized 370 diverse linkers, generated from random 18–amino acid sequences with no homology to known proteins, fusing sfGFP to a nanobody. Although sfGFP fluorescence exhibited no clear correlation with canonical linker properties like flexibility or rigidity, we identified a correlation between amino acid composition and functional output. Furthermore, AlphaFold-predicted substructures encompassing the linker and adjacent sfGFP regions revealed considerable structural diversity while maintaining the overall sfGFP fold. Notably, in silico structural features derived from the Cα–Cα distance matrix of these predicted substructures correlated with fluorescence, providing a structural rationale for the functional variation. By training on both sequence representations and in silico substructural features, we developed a multimodal deep learning framework to quantitatively customize linker sequences for high sfGFP fluorescence in special fusion constructs. This work presents a generalizable framework for engineering peptide linkers to assemble highly functional fusion proteins.

  • Research Article
  • Cite Count Icon 1
  • 10.1158/1538-7445.am2024-2313
Abstract 2313: Multi-modal deep learning to predict cancer outcomes by integrating radiology and pathology images
  • Mar 22, 2024
  • Cancer Research
  • Zhe Li + 2 more

Purpose: Cancer patients routinely undergo radiologic and pathologic evaluation for their diagnostic workup. These data modalities represent a valuable and readily available resource for developing new prognostic tools. Given their vast difference in spatial scales, effective methods to integrate the two modalities are currently lacking. Here, we aim to develop a multi-modal approach to integrate radiology and pathology images for predicting outcomes in cancer patients. Methods: We propose a multi-modal weakly-supervised deep learning framework to integrate radiology and pathology images for survival prediction. We first extract multi-scale features from whole-slide H&amp;E-stained pathology images to characterize cellular and tissue phenotypes as well as spatial cellular organization. We then build a hierarchical co-attention transformer to effectively learn the multi-modal interactions between radiology and pathology image features. Finally, a multimodal risk score is derived by combining complementary information from two images modalities and clinical data for predicting outcome. We evaluate our approach in lung, gastric, and brain cancers with matched radiology and pathology images and clinical data available, each with separate training and external validation cohorts. Results: The multi-modal deep learning models achieved a reasonably high accuracy for predicting survival outcomes in the external validation cohorts (C-index range: 0.72-0.75 across three cancer types). The multi-modal prognostic models significantly improved upon single-modal approach based on radiology or pathology images or clinical data alone (C-index range: 0.53-0.71, P&amp;lt;0.01). The multi-modal deep learning models were significantly associated with disease-free survival and overall survival (hazard ratio range: 3.23-4.46, P&amp;lt;0.0001). In multivariable analyses, the models remained an independent prognostic factor (P&amp;lt;0.01) after adjusting for clinicopathological variables including cancer stage and tumor differentiation. Conclusions: The proposed multi-modal deep learning approach outperforms traditional methods for predicting survival outcomes by leveraging routinely available radiology and pathology images. With further independent validation, this may afford a promising approach to improve risk stratification and better inform treatment strategies for cancer patients. Citation Format: Zhe Li, Yuming Jiang, Ruijiang Li. Multi-modal deep learning to predict cancer outcomes by integrating radiology and pathology images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2313.

  • Research Article
  • Cite Count Icon 5
  • 10.3390/curroncol31110530
Clinically Significant Prostate Cancer Prediction Using Multimodal Deep Learning with Prostate-Specific Antigen Restriction.
  • Nov 15, 2024
  • Current oncology (Toronto, Ont.)
  • Hayato Takeda + 18 more

Prostate cancer (PCa) is a clinically heterogeneous disease. Predicting clinically significant PCa with low-intermediate prostate-specific antigen (PSA), which often includes aggressive cancers, is imperative. This study evaluated the predictive accuracy of deep learning analysis using multimodal medical data focused on clinically significant PCa in patients with PSA ≤ 20 ng/mL. Our cohort study included 178 consecutive patients who underwent ultrasound-guided prostate biopsy. Deep learning analyses were applied to predict clinically significant PCa. We generated receiver operating characteristic curves and calculated the corresponding area under the curve (AUC) to assess the prediction. The AUC of the integrated medical data using our multimodal deep learning approach was 0.878 (95% confidence interval [CI]: 0.772-0.984) in all patients without PSA restriction. Despite the reduced predictive ability of PSA when restricted to PSA ≤ 20 ng/mL (n = 122), the AUC was 0.862 (95% CI: 0.723-1.000), complemented by imaging data. In addition, we assessed clinical presentations and images belonging to representative false-negative and false-positive cases. Our multimodal deep learning approach assists physicians in determining treatment strategies by predicting clinically significant PCa in patients with PSA ≤ 20 ng/mL before biopsy, contributing to personalized medical workflows for PCa management.

  • Supplementary Content
  • Cite Count Icon 28
  • 10.1093/genetics/iyae161
A review of multimodal deep learning methods for genomic-enabled predictionin plant breeding
  • Nov 5, 2024
  • Genetics
  • Osval A Montesinos-López + 9 more

Deep learning methods have been applied when working to enhance the prediction accuracyof traditional statistical methods in the field of plant breeding. Although deep learningseems to be a promising approach for genomic prediction, it has proven to have somelimitations, since its conventional methods fail to leverage all available information.Multimodal deep learning methods aim to improve the predictive power of their unimodalcounterparts by introducing several modalities (sources) of input information. In thisreview, we introduce some theoretical basic concepts of multimodal deep learning andprovide a list of the most widely used neural network architectures in deep learning, aswell as the available strategies to fuse data from different modalities. We mention someof the available computational resources for the practical implementation of multimodaldeep learning problems. We finally performed a review of applications of multimodal deeplearning to genomic selection in plant breeding and other related fields. We present ameta-picture of the practical performance of multimodal deep learning methods to highlighthow these tools can help address complex problems in the field of plant breeding. Wediscussed some relevant considerations that researchers should keep in mind when applyingmultimodal deep learning methods. Multimodal deep learning holds significant potential forvarious fields, including genomic selection. While multimodal deep learning displaysenhanced prediction capabilities over unimodal deep learning and other machine learningmethods, it demands more computational resources. Multimodal deep learning effectivelycaptures intermodal interactions, especially when integrating data from different sources.To apply multimodal deep learning in genomic selection, suitable architectures and fusionstrategies must be chosen. It is relevant to keep in mind that multimodal deep learning,like unimodal deep learning, is a powerful tool but should be carefully applied. Given itspredictive edge over traditional methods, multimodal deep learning is valuable inaddressing challenges in plant breeding and food security amid a growing globalpopulation.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant