Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Reviewing Multimodal Machine Learning and Its Use in Cardiovascular Diseases Detection

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Machine Learning (ML) and Deep Learning (DL) are derivatives of Artificial Intelligence (AI) that have already demonstrated their effectiveness in a variety of domains, including healthcare, where they are now routinely integrated into patients’ daily activities. On the other hand, data heterogeneity has long been a key obstacle in AI, ML and DL. Here, Multimodal Machine Learning (Multimodal ML) has emerged as a method that enables the training of complex ML and DL models that use heterogeneous data in their learning process. In addition, Multimodal ML enables the integration of multiple models in the search for a single, comprehensive solution to a complex problem. In this review, the technical aspects of Multimodal ML are discussed, including a definition of the technology and its technical underpinnings, especially data fusion. It also outlines the differences between this technology and others, such as Ensemble Learning, as well as the various workflows that can be followed in Multimodal ML. In addition, this article examines in depth the use of Multimodal ML in the detection and prediction of Cardiovascular Diseases, highlighting the results obtained so far and the possible starting points for improving its use in the aforementioned field. Finally, a number of the most common problems hindering the development of this technology and potential solutions that could be pursued in future studies are outlined.

Similar Papers
  • Research Article
  • Cite Count Icon 28
  • 10.18034/ajhal.v4i2.658
Analysis of Multimodal Data Using Deep Learning and Machine Learning
  • Dec 31, 2017
  • Asian Journal of Humanity, Art and Literature
  • Swetha Reddy Thodupunori

A modality is an event or experience. Life is multimodal, see, hear, smell, feel, and taste. Multimodal experiences involve some world modalities. Artificial intelligence must grasp multimodal views to understand our surroundings. Multimodal machine learning models interact and correlate input from several modalities. It's a multi-disciplinary field with great potential. In this study, we analyze emerging multimodal machine learning technologies and categorize them scientifically rather than focusing on specific multimodal applications. Multimodal machine learning offers more potential and problems than classifications. Most multimodal learning research collects quantitative data from polls and surveys. This research reviews a detailed library of observational studies on multimodal data (MMD) skills for human learning using artificial intelligence-powered approaches including Machine Learning and Deep Learning. This research also describes how MMD has improved learning and in what environments. This paper discusses multimodal learning and its ongoing improvements and approaches to improving learning. Finally, future researchers should carefully consider building a system that aligns multimodal aspects with the study and learning plan. These elements could enhance multimodal learning by facilitating theory and practice activities. This research lays the groundwork for multimodal data use in future learning technologies and development.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.jdent.2023.104588
Multi-modal deep learning for automated assembly of periapical radiographs
  • Jun 21, 2023
  • Journal of Dentistry
  • L Pfänder + 5 more

Multi-modal deep learning for automated assembly of periapical radiographs

  • Research Article
  • Cite Count Icon 6
  • 10.13374/j.issn2095-9389.2019.03.21.003
A survey of multimodal machine learning
  • May 1, 2020
  • SHILAP Revista de lepidopterología
  • Peng Chen + 5 more

“Big data” is always collected from different resources that have different data structures. With the rapid development of information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper with a focus on future research issues in this field.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.1038/s41598-024-66481-4
Explainable artificial intelligence (XAI) for predicting the need for intubation in methanol-poisoned patients: a study comparing deep and machine learning models
  • Jul 8, 2024
  • Scientific Reports
  • Khadijeh Moulaei + 14 more

The need for intubation in methanol-poisoned patients, if not predicted in time, can lead to irreparable complications and even death. Artificial intelligence (AI) techniques like machine learning (ML) and deep learning (DL) greatly aid in accurately predicting intubation needs for methanol-poisoned patients. So, our study aims to assess Explainable Artificial Intelligence (XAI) for predicting intubation necessity in methanol-poisoned patients, comparing deep learning and machine learning models. This study analyzed a dataset of 897 patient records from Loghman Hakim Hospital in Tehran, Iran, encompassing cases of methanol poisoning, including those requiring intubation (202 cases) and those not requiring it (695 cases). Eight established ML (SVM, XGB, DT, RF) and DL (DNN, FNN, LSTM, CNN) models were used. Techniques such as tenfold cross-validation and hyperparameter tuning were applied to prevent overfitting. The study also focused on interpretability through SHAP and LIME methods. Model performance was evaluated based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior performance in accuracy (94.0%), sensitivity (99.0%), specificity (94.0%), and F1-score (97.0%). CNN led in ROC with 78.0%. For ML models, RF excelled in accuracy (97.0%) and specificity (100%), followed by XGB with sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%). Overall, RF and XGB outperformed other models, with accuracy (97.0%) and specificity (100%) for RF, and sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%) for XGB. ML models surpassed DL models across all metrics, with accuracies from 93.0% to 97.0% for DL and 93.0% to 99.0% for ML. Sensitivities ranged from 98.0% to 99.37% for DL and 93.0% to 99.0% for ML. DL models achieved specificities from 78.0% to 94.0%, while ML models ranged from 93.0% to 100%. F1-scores for DL were between 93.0% and 97.0%, and for ML between 96.0% and 98.27%. DL models scored ROC between 68.0% and 78.0%, while ML models ranged from 84.0% to 96.08%. Key features for predicting intubation necessity include GCS at admission, ICU admission, age, longer folic acid therapy duration, elevated BUN and AST levels, VBG_HCO3 at initial record, and hemodialysis presence. This study as the showcases XAI's effectiveness in predicting intubation necessity in methanol-poisoned patients. ML models, particularly RF and XGB, outperform DL counterparts, underscoring their potential for clinical decision-making.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.21271/zjpas.34.2.3
Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
  • Apr 12, 2022
  • ZANCO JOURNAL OF PURE AND APPLIED SCIENCES
  • Chiman Haydar Salh + 1 more

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning

  • Research Article
  • 10.1145/3476415.3476435
Increasing trust in complex machine learning systems
  • Jun 1, 2021
  • ACM SIGIR Forum
  • Jaehun Kim

Machine learning (ML) has become a core technology for many real-world applications. Modern ML models are applied to unprecedentedly complex and difficult challenges, including very large and subjective problems. For instance, applications towards multimedia understanding have been advanced substantially. Here, it is already prevalent that cultural/artistic objects such as music and videos are analyzed and served to users according to their preference, enabled through ML techniques. One of the most recent breakthroughs in ML is Deep Learning (DL), which has been immensely adopted to tackle such complex problems. DL allows for higher learning capacity, making end-to-end learning possible, which reduces the need for substantial engineering effort, while achieving high effectiveness. At the same time, this also makes DL models more complex than conventional ML models. Reports in several domains indicate that such more complex ML models may have potentially critical hidden problems: various biases embedded in the training data can emerge in the prediction, extremely sensitive models can make unaccountable mistakes. Furthermore, the black-box nature of the DL models hinders the interpretation of the mechanisms behind them. Such unexpected drawbacks result in a significant impact on the trustworthiness of the systems in which the ML models are equipped as the core apparatus. In this thesis, a series of studies investigates aspects of trustworthiness for complex ML applications, namely the reliability and explainability. Specifically, we focus on music as the primary domain of interest, considering its complexity and subjectivity. Due to this nature of music, ML models for music are necessarily complex for achieving meaningful effectiveness. As such, the reliability and explainability of music ML models are crucial in the field. The first main chapter of the thesis investigates the transferability of the neural network in the Music Information Retrieval (MIR) context. Transfer learning, where the pre-trained ML models are used as off-the-shelf modules for the task at hand, has become one of the major ML practices. It is helpful since a substantial amount of the information is already encoded in the pre-trained models, which allows the model to achieve high effectiveness even when the amount of the dataset for the current task is scarce. However, this may not always be true if the "source" task which pre-trained the model shares little commonality with the "target" task at hand. An experiment including multiple "source" tasks and "target" tasks was conducted to examine the conditions which have a positive effect on the transferability. The result of the experiment suggests that the number of source tasks is a major factor of transferability. Simultaneously, it is less evident that there is a single source task that is universally effective on multiple target tasks. Overall, we conclude that considering multiple pre-trained models or pre-training a model employing heterogeneous source tasks can increase the chance for successful transfer learning. The second major work investigates the robustness of the DL models in the transfer learning context. The hypothesis is that the DL models can be susceptible to imperceptible noise on the input. This may drastically shift the analysis of similarity among inputs, which is undesirable for tasks such as information retrieval. Several DL models pre-trained in MIR tasks are examined for a set of plausible perturbations in a real-world setup. Based on a proposed sensitivity measure, the experimental results indicate that all the DL models were substantially vulnerable to perturbations, compared to a traditional feature encoder. They also suggest that the experimental framework can be used to test the pre-trained DL models for measuring robustness. In the final main chapter, the explainability of black-box ML models is discussed. In particular, the chapter focuses on the evaluation of the explanation derived from model-agnostic explanation methods. With black-box ML models having become common practice, model-agnostic explanation methods have been developed to explain a prediction. However, the evaluation of such explanations is still an open problem. The work introduces an evaluation framework that measures the quality of the explanations employing fidelity and complexity. Fidelity refers to the explained mechanism's coherence to the black-box model, while complexity is the length of the explanation. Throughout the thesis, we gave special attention to the experimental design, such that robust conclusions can be reached. Furthermore, we focused on delivering machine learning framework and evaluation frameworks. This is crucial, as we intend that the experimental design and results will be reusable in general ML practice. As it implies, we also aim our findings to be applicable beyond the music applications such as computer vision or natural language processing. Trustworthiness in ML is not a domain-specific problem. Thus, it is vital for both researchers and practitioners from diverse problem spaces to increase awareness of complex ML systems' trustworthiness. We believe the research reported in this thesis provides meaningful stepping stones towards the trustworthiness of ML.

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3580305.3599208
International Workshop on Multimodal Learning - 2023 Theme: Multimodal Learning with Foundation Models
  • Aug 4, 2023
  • Yuan Ling + 5 more

The recent advancements in machine learning and artificial intelligence (particularly foundation models such as BERT, GPT-3, T5, ResNet, etc.) have demonstrated remarkable capabilities and driven significant revolutionary changes to the way we make inferences from complex data. These models represent a fundamental shift in the way data are approached and offer exciting new research directions and opportunities for multimodal learning and data fusion. Given the potential of foundation models to transform the field of multimodal learning, there is a need to bring together experts and researchers to discuss the latest developments in this area, exchange ideas, and identify key research questions and challenges that need to be addressed. By hosting this workshop, we aim to create a forum for researchers to share their insights and expertise on multimodal data fusion and learning using foundation models, and to explore potential new research directions and applications in the rapidly evolving field. We expect contributions from interdisciplinary researchers to study and model interactions between (but not limited to) modalities of language, graphs, time-series, vision, tabular data, sensors, and more. Our workshop will emphasize interdisciplinary work and aim at seeding cross-team collaborations around new tasks, datasets, and models.

  • Research Article
  • Cite Count Icon 27
  • 10.1007/s12553-023-00757-z
A comprehensive review of COVID-19 detection with machine learning and deep learning techniques
  • Jun 7, 2023
  • Health and Technology
  • Sreeparna Das + 2 more

PurposeThe first transmission of coronavirus to humans started in Wuhan city of China, took the shape of a pandemic called Corona Virus Disease 2019 (COVID-19), and posed a principal threat to the entire world. The researchers are trying to inculcate artificial intelligence (Machine learning or deep learning models) for the efficient detection of COVID-19. This research explores all the existing machine learning (ML) or deep learning (DL) models, used for COVID-19 detection which may help the researcher to explore in different directions. The main purpose of this review article is to present a compact overview of the application of artificial intelligence to the research experts, helping them to explore the future scopes of improvement.MethodsThe researchers have used various machine learning, deep learning, and a combination of machine and deep learning models for extracting significant features and classifying various health conditions in COVID-19 patients. For this purpose, the researchers have utilized different image modalities such as CT-Scan, X-Ray, etc. This study has collected over 200 research papers from various repositories like Google Scholar, PubMed, Web of Science, etc. These research papers were passed through various levels of scrutiny and finally, 50 research articles were selected.ResultsIn those listed articles, the ML / DL models showed an accuracy of 99% and above while performing the classification of COVID-19. This study has also presented various clinical applications of various research. This study specifies the importance of various machine and deep learning models in the field of medical diagnosis and research.ConclusionIn conclusion, it is evident that ML/DL models have made significant progress in recent years, but there are still limitations that need to be addressed. Overfitting is one such limitation that can lead to incorrect predictions and overburdening of the models. The research community must continue to work towards finding ways to overcome these limitations and make machine and deep learning models even more effective and efficient. Through this ongoing research and development, we can expect even greater advances in the future.

  • Discussion
  • Cite Count Icon 8
  • 10.1016/j.ejmp.2021.05.008
Focus issue: Artificial intelligence in medical physics.
  • Mar 1, 2021
  • Physica Medica
  • F Zanca + 11 more

Focus issue: Artificial intelligence in medical physics.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 29
  • 10.1007/s10916-024-02087-7
A Systematic Review of Artificial Intelligence Models for Time-to-Event Outcome Applied in Cardiovascular Disease Risk Prediction
  • Jan 1, 2024
  • Journal of Medical Systems
  • Achamyeleh Birhanu Teshale + 4 more

Artificial intelligence (AI) based predictive models for early detection of cardiovascular disease (CVD) risk are increasingly being utilised. However, AI based risk prediction models that account for right-censored data have been overlooked. This systematic review (PROSPERO protocol CRD42023492655) includes 33 studies that utilised machine learning (ML) and deep learning (DL) models for survival outcome in CVD prediction. We provided details on the employed ML and DL models, eXplainable AI (XAI) techniques, and type of included variables, with a focus on social determinants of health (SDoH) and gender-stratification. Approximately half of the studies were published in 2023 with the majority from the United States. Random Survival Forest (RSF), Survival Gradient Boosting models, and Penalised Cox models were the most frequently employed ML models. DeepSurv was the most frequently employed DL model. DL models were better at predicting CVD outcomes than ML models. Permutation-based feature importance and Shapley values were the most utilised XAI methods for explaining AI models. Moreover, only one in five studies performed gender-stratification analysis and very few incorporate the wide range of SDoH factors in their prediction model. In conclusion, the evidence indicates that RSF and DeepSurv models are currently the optimal models for predicting CVD outcomes. This study also highlights the better predictive ability of DL survival models, compared to ML models. Future research should ensure the appropriate interpretation of AI models, accounting for SDoH, and gender stratification, as gender plays a significant role in CVD occurrence.

  • Research Article
  • Cite Count Icon 14
  • 10.1016/j.ijmedinf.2025.105812
Deep learning and machine learning in CT-based COPD diagnosis: Systematic review and meta-analysis.
  • Apr 1, 2025
  • International journal of medical informatics
  • Qian Wu + 3 more

Deep learning and machine learning in CT-based COPD diagnosis: Systematic review and meta-analysis.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.atech.2025.100851
Evaluation of multispectral imaging for freeze damage assessment in strawberries using AI-based computer vision technology
  • Mar 1, 2025
  • Smart Agricultural Technology
  • Sunil Gc + 3 more

Evaluation of multispectral imaging for freeze damage assessment in strawberries using AI-based computer vision technology

  • Dissertation
  • 10.32657/10356/182346
Data efficient deep multimodal learning
  • Jan 1, 2025
  • Meng Shen

Multimodal learning, which enables neural networks to process and integrate information from various sensory modalities such as vision, language, and sound, has become increasingly important in applications ranging from affective computing and healthcare to advanced multimodal chatbots. Despite its potential, multimodal learning faces significant challenges, particularly in the area of data efficiency. The requirement for large, high-quality datasets from multiple modalities presents a substantial barrier, limiting the scalability and accessibility of large multimodal models. This dissertation addresses several key issues in data-efficient deep multimodal learning, focusing on the imbalanced multimodal data selection, the cold-start problem in multimodal active learning, and the mitigation of hallucinations in large vision-language models. Firstly, we analyze the limitations of conventional active learning strategies, which tend to favor dominant modalities, leading to unbalanced multimodal models that neglect weaker modalities. To overcome this, we propose a gradient embedding modulation method that ensures a more equitable data selection process across modalities, resulting in models that fairly uilize both weak and strong modalities. Building on our work in warm-start active learning, we tackle the cold-start problem in multimodal active learning, where no initial labels are available for warm-start data selection. We develop a two-stage approach that first reduces the modality representation gap through multimodal self-supervised learning, utilizing unimodal prototypes to harmonize representations across modalities. In the subsequent data selection stage, we introduce a regularization term to maximize modality alignment, leading to improved model performance using the same amount of data compared to existing methods. Extending our focus from data selection to the usage of training data, we address the challenge of hallucinations in large vision-language models, where the models generate content that is incorrect in the context of input images. We investigate the relationship between hallucinations and visual dependence of tokens, revealing that certain tokens contribute disproportionately to these hallucinatory. Based on this insight, we propose an approach that adjusts training weights according to the visual dependence of tokens, thereby reducing the hallucination rate without requiring additional training data or inference costs. The contributions of this thesis offer significant advancements in the field of dataefficient multimodal learning. By developing novel methods for balancing multimodal data selection, addressing cold-start problem in multimodal active learning, and mitigating hallucinations in large vision-language models, this work paves the way for more practical and scalable multimodal learning systems that require less data and computational effort while achieving superior performance.

  • Research Article
  • Cite Count Icon 123
  • 10.1016/j.inffus.2023.102217
A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges
  • Dec 30, 2023
  • Information Fusion
  • Khaled Bayoudh

A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges

  • Research Article
  • Cite Count Icon 32
  • 10.1155/2022/5849995
Effectiveness of Artificial Intelligence Models for Cardiovascular Disease Prediction: Network Meta-Analysis
  • Feb 24, 2022
  • Computational Intelligence and Neuroscience
  • Yahia Baashar + 6 more

Heart failure is the most common cause of death in both males and females around the world. Cardiovascular diseases (CVDs), in particular, are the main cause of death worldwide, accounting for 30% of all fatalities in the United States and 45% in Europe. Artificial intelligence (AI) approaches such as machine learning (ML) and deep learning (DL) models are playing an important role in the advancement of heart failure therapy. The main objective of this study was to perform a network meta-analysis of patients with heart failure, stroke, hypertension, and diabetes by comparing the ML and DL models. A comprehensive search of five electronic databases was performed using ScienceDirect, EMBASE, PubMed, Web of Science, and IEEE Xplore. The search strategy was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. The methodological quality of studies was assessed by following the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) guidelines. The random-effects network meta-analysis forest plot with categorical data was used, as were subgroups testing for all four types of treatments and calculating odds ratio (OR) with a 95% confidence interval (CI). Pooled network forest, funnel plots, and the league table, which show the best algorithms for each outcome, were analyzed. Seventeen studies, with a total of 285,213 patients with CVDs, were included in the network meta-analysis. The statistical evidence indicated that the DL algorithms performed well in the prediction of heart failure with AUC of 0.843 and CI [0.840–0.845], while in the ML algorithm, the gradient boosting machine (GBM) achieved an average accuracy of 91.10% in predicting heart failure. An artificial neural network (ANN) performed well in the prediction of diabetes with an OR and CI of 0.0905 [0.0489; 0.1673]. Support vector machine (SVM) performed better for the prediction of stroke with OR and CI of 25.0801 [11.4824; 54.7803]. Random forest (RF) results performed well in the prediction of hypertension with OR and CI of 10.8527 [4.7434; 24.8305]. The findings of this work suggest that the DL models can effectively advance the prediction of and knowledge about heart failure, but there is a lack of literature regarding DL methods in the field of CVDs. As a result, more DL models should be applied in this field. To confirm our findings, more meta-analysis (e.g., Bayesian network) and thorough research with a larger number of patients are encouraged.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface