Breath-based esophageal cancer diagnosis using an electronic nose with multimodal sensor array and machine learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Breath-based esophageal cancer diagnosis using an electronic nose with multimodal sensor array and machine learning

Similar Papers
  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.matpr.2020.01.335
Combinatorial gas phase electrodeposition for fabrication of three-dimensional multimodal gas sensor array
  • Jan 1, 2020
  • Materials Today: Proceedings
  • Nishchay A Isaac + 7 more

Combinatorial gas phase electrodeposition for fabrication of three-dimensional multimodal gas sensor array

  • Research Article
  • Cite Count Icon 36
  • 10.1016/j.eswa.2023.121168
A survey on multimodal bidirectional machine learning translation of image and natural language processing
  • Aug 12, 2023
  • Expert Systems with Applications
  • Wongyung Nam + 1 more

A survey on multimodal bidirectional machine learning translation of image and natural language processing

  • Research Article
  • Cite Count Icon 6
  • 10.13374/j.issn2095-9389.2019.03.21.003
A survey of multimodal machine learning
  • May 1, 2020
  • SHILAP Revista de lepidopterología
  • Peng Chen + 5 more

“Big data” is always collected from different resources that have different data structures. With the rapid development of information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper with a focus on future research issues in this field.

  • Research Article
  • Cite Count Icon 28
  • 10.18034/ajhal.v4i2.658
Analysis of Multimodal Data Using Deep Learning and Machine Learning
  • Dec 31, 2017
  • Asian Journal of Humanity, Art and Literature
  • Swetha Reddy Thodupunori

A modality is an event or experience. Life is multimodal, see, hear, smell, feel, and taste. Multimodal experiences involve some world modalities. Artificial intelligence must grasp multimodal views to understand our surroundings. Multimodal machine learning models interact and correlate input from several modalities. It's a multi-disciplinary field with great potential. In this study, we analyze emerging multimodal machine learning technologies and categorize them scientifically rather than focusing on specific multimodal applications. Multimodal machine learning offers more potential and problems than classifications. Most multimodal learning research collects quantitative data from polls and surveys. This research reviews a detailed library of observational studies on multimodal data (MMD) skills for human learning using artificial intelligence-powered approaches including Machine Learning and Deep Learning. This research also describes how MMD has improved learning and in what environments. This paper discusses multimodal learning and its ongoing improvements and approaches to improving learning. Finally, future researchers should carefully consider building a system that aligns multimodal aspects with the study and learning plan. These elements could enhance multimodal learning by facilitating theory and practice activities. This research lays the groundwork for multimodal data use in future learning technologies and development.

  • Research Article
  • Cite Count Icon 4
  • 10.2196/72822
Decoding Digital Discourse Through Multimodal Text and Image Machine Learning Models to Classify Sentiment and Detect Hate Speech in Race- and Lesbian, Gay, Bisexual, Transgender, Queer, Intersex, and Asexual Community-Related Posts on Social Media: Quantitative Study.
  • May 12, 2025
  • Journal of medical Internet research
  • Thu T Nguyen + 10 more

A major challenge in sentiment analysis on social media is the increasing prevalence of image-based content, which integrates text and visuals to convey nuanced messages. Traditional text-based approaches have been widely used to assess public attitudes and beliefs; however, they often fail to fully capture the meaning of multimodal content where cultural, contextual, and visual elements play a significant role. This study aims to provide practical guidance for collecting, processing, and analyzing social media data using multimodal machine learning models. Specifically, it focuses on training and fine-tuning models to classify sentiment and detect hate speech. Social media data were collected from Facebook and Instagram using CrowdTangle, a public insights tool by Meta, and from X via its academic research application programming interface. The dataset was filtered to include only race-related terms and lesbian, gay, bisexual, transgender, queer, intersex, and asexual community-related posts with image attachments, ensuring focus on multimodal content. Human annotators labeled 13,000 posts into 4 categories: negative sentiment, positive sentiment, hate, or antihate. We evaluated unimodal (Bidirectional Encoder Representations from Transformers for text and Visual Geometry Group 16 for images) and multimodal (Contrastive Language-Image Pretraining [CLIP], Visual Bidirectional Encoder Representations from Transformers [VisualBERTs], and an intermediate fusion) models. To enhance model performance, the synthetic minority oversampling technique was applied to address class imbalances, and latent Dirichlet allocation was used to improve semantic representations. Our findings highlighted key differences in model performance. Among unimodal models, Bidirectional Encoder Representations from Transformer outperformed Visual Geometry Group 16, achieving higher accuracy and macro-F1-scores across all tasks. Among multimodal models, CLIP achieved the highest accuracy (0.86) in negative sentiment detection, followed by VisualBERT (0.84). For positive sentiment, VisualBERT outperformed other models with the highest accuracy (0.76). In hate speech detection, the intermediate fusion model demonstrated the highest accuracy (0.91) with a macro-F1-score of 0.64, ensuring balanced performance. Meanwhile, VisualBERT performed best in antihate classification, achieving an accuracy of 0.78. Applying latent Dirichlet allocation and the synthetic minority oversampling technique improved minority class detection, particularly for antihate content. Overall, the intermediate fusion model provided the most balanced performance across tasks, while CLIP excelled in accuracy-driven classifications. Although VisualBERT performed well in certain areas, it struggled to maintain a precision-recall balance. These results emphasized the effectiveness of multimodal approaches over unimodal models in analyzing social media sentiment. This study contributes to the growing research on multimodal machine learning by demonstrating how advanced models, data augmentation techniques, and diverse datasets can enhance the analysis of social media content. The findings offer valuable insights for researchers, policy makers, and public health professionals seeking to leverage artificial intelligence for social media monitoring and addressing broader societal challenges.

  • Research Article
  • 10.1121/1.5035696
Multimodal signal processing and machine learning for hearing instruments
  • Mar 1, 2018
  • The Journal of the Acoustical Society of America
  • Tao Zhang + 1 more

With advances in wireless technology and sensor miniaturization, more and more non-audio sensors become available to and are being integrated into hearing instruments. These sensors help not only improve speech understanding and sound quality, enhance hearing usability and expand the hearing instruments' capabilities to health and wellness monitoring. However, the introduction of these sensors also present a new set of challenges to researchers and engineers. Compared with traditional audio sensors for hearing instruments, these new sensor inputs can come from different modalities and often have different scales and sampling frequencies. In some cases, they are not linear or synchronized to each other. In this presentation, we will review these challenges in details in the context of hearing instruments applications. Furthermore, we will demonstrate how multimodal signal processing and machine learning can be used to overcome these challenges and bring a greater degree of satisfactions to the end users. Finally, future directions in multimodal signal processing and machine learning research for hearing instruments will be discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 53
  • 10.3390/electronics12071558
Reviewing Multimodal Machine Learning and Its Use in Cardiovascular Diseases Detection
  • Mar 26, 2023
  • Electronics
  • Mohammad Moshawrab + 4 more

Machine Learning (ML) and Deep Learning (DL) are derivatives of Artificial Intelligence (AI) that have already demonstrated their effectiveness in a variety of domains, including healthcare, where they are now routinely integrated into patients’ daily activities. On the other hand, data heterogeneity has long been a key obstacle in AI, ML and DL. Here, Multimodal Machine Learning (Multimodal ML) has emerged as a method that enables the training of complex ML and DL models that use heterogeneous data in their learning process. In addition, Multimodal ML enables the integration of multiple models in the search for a single, comprehensive solution to a complex problem. In this review, the technical aspects of Multimodal ML are discussed, including a definition of the technology and its technical underpinnings, especially data fusion. It also outlines the differences between this technology and others, such as Ensemble Learning, as well as the various workflows that can be followed in Multimodal ML. In addition, this article examines in depth the use of Multimodal ML in the detection and prediction of Cardiovascular Diseases, highlighting the results obtained so far and the possible starting points for improving its use in the aforementioned field. Finally, a number of the most common problems hindering the development of this technology and potential solutions that could be pursued in future studies are outlined.

  • Dissertation
  • Cite Count Icon 3
  • 10.26686/wgtn.17003935
L'Arte Di Interazione Musicale: New Musical Possibilities Through Multimodal Techniques
  • Jan 1, 2013
  • Jordan Natan Hochenbaum

<p>Multimodal communication is an essential aspect of human perception, facilitating the ability to reason, deduce, and understand meaning. Utilizing multimodal senses, humans are able to relate to the world in many different contexts. This dissertation looks at surrounding issues of multimodal communication as it pertains to human-computer interaction. If humans rely on multimodality to interact with the world, how can multimodality benefit the ways in which humans interface with computers? Can multimodality be used to help the machine understand more about the person operating it and what associations derive from this type of communication? This research places multimodality within the domain of musical performance, a creative field rich with nuanced physical and emotive aspects. This dissertation asks, what kinds of new sonic collaborations between musicians and computers are possible through the use of multimodal techniques? Are there specific performance areas where multimodal analysis and machine learning can benefit training musicians? In similar ways can multimodal interaction or analysis support new forms of creative processes? Applying multimodal techniques to music-computer interaction is a burgeoning effort. As such the scope of the research is to lay a foundation of multimodal techniques for the future. In doing so the first work presented is a software system for capturing synchronous multimodal data streams from nearly any musical instrument, interface, or sensor system. This dissertation also presents a variety of multimodal analysis scenarios for machine learning. This includes automatic performer recognition for both string and drum instrument players, to demonstrate the significance of multimodal musical analysis. Training the computer to recognize who is playing an instrument suggests important information is contained not only within the acoustic output of a performance, but also in the physical domain. Machine learning is also used to perform automatic drum-stroke identification; training the computer to recognize which hand a drummer uses to strike a drum. There are many applications for drum-stroke identification including more detailed automatic transcription, interactive training (e.g. computer-assisted rudiment practice), and enabling efficient analysis of drum performance for metrics tracking. Furthermore, this research also presents the use of multimodal techniques in the context of everyday practice. A practicing musician played a sensoraugmented instrument and recorded his practice over an extended period of time, realizing a corpus of metrics and visualizations from his performance. Additional multimodal metrics are discussed in the research, and demonstrate new types of performance statistics obtainable from a multimodal approach. The primary contributions of this work include (1) a new software tool enabling musicians, researchers, and educators to easily capture multimodal information from nearly any musical instrument or sensor system; (2) investigating multimodal machine learning for automatic performer recognition of both string players and percussionists; (3) multimodal machine learning for automatic drum-stroke identification; (4a) applying multimodal techniques to musical pedagogy and training scenarios; (4b) investigating novel multimodal metrics; (5) lastly this research investigates the possibilities, affordances, and design considerations of multimodal musicianship both in the acoustic domain, as well as in other musical interface scenarios. This work provides a foundation from which engaging musical-computer interactions can occur in the future, benefitting from the unique nuances of multimodal techniques.</p>

  • Dissertation
  • 10.26686/wgtn.17003935.v1
L'Arte Di Interazione Musicale: New Musical Possibilities Through Multimodal Techniques
  • Jan 1, 2013
  • Jordan Natan Hochenbaum

<p>Multimodal communication is an essential aspect of human perception, facilitating the ability to reason, deduce, and understand meaning. Utilizing multimodal senses, humans are able to relate to the world in many different contexts. This dissertation looks at surrounding issues of multimodal communication as it pertains to human-computer interaction. If humans rely on multimodality to interact with the world, how can multimodality benefit the ways in which humans interface with computers? Can multimodality be used to help the machine understand more about the person operating it and what associations derive from this type of communication? This research places multimodality within the domain of musical performance, a creative field rich with nuanced physical and emotive aspects. This dissertation asks, what kinds of new sonic collaborations between musicians and computers are possible through the use of multimodal techniques? Are there specific performance areas where multimodal analysis and machine learning can benefit training musicians? In similar ways can multimodal interaction or analysis support new forms of creative processes? Applying multimodal techniques to music-computer interaction is a burgeoning effort. As such the scope of the research is to lay a foundation of multimodal techniques for the future. In doing so the first work presented is a software system for capturing synchronous multimodal data streams from nearly any musical instrument, interface, or sensor system. This dissertation also presents a variety of multimodal analysis scenarios for machine learning. This includes automatic performer recognition for both string and drum instrument players, to demonstrate the significance of multimodal musical analysis. Training the computer to recognize who is playing an instrument suggests important information is contained not only within the acoustic output of a performance, but also in the physical domain. Machine learning is also used to perform automatic drum-stroke identification; training the computer to recognize which hand a drummer uses to strike a drum. There are many applications for drum-stroke identification including more detailed automatic transcription, interactive training (e.g. computer-assisted rudiment practice), and enabling efficient analysis of drum performance for metrics tracking. Furthermore, this research also presents the use of multimodal techniques in the context of everyday practice. A practicing musician played a sensoraugmented instrument and recorded his practice over an extended period of time, realizing a corpus of metrics and visualizations from his performance. Additional multimodal metrics are discussed in the research, and demonstrate new types of performance statistics obtainable from a multimodal approach. The primary contributions of this work include (1) a new software tool enabling musicians, researchers, and educators to easily capture multimodal information from nearly any musical instrument or sensor system; (2) investigating multimodal machine learning for automatic performer recognition of both string players and percussionists; (3) multimodal machine learning for automatic drum-stroke identification; (4a) applying multimodal techniques to musical pedagogy and training scenarios; (4b) investigating novel multimodal metrics; (5) lastly this research investigates the possibilities, affordances, and design considerations of multimodal musicianship both in the acoustic domain, as well as in other musical interface scenarios. This work provides a foundation from which engaging musical-computer interactions can occur in the future, benefitting from the unique nuances of multimodal techniques.</p>

  • Research Article
  • 10.66535/761h2729
Machine learning-enabled electronic noses and electronic tongues A new paradigm for markers detection
  • Jan 1, 2026
  • Mozi
  • Xiaoyao Wu + 2 more

With the rapid development of science and technology, artificial intelligence (AI) and machine learning (ML) have become the forefront of innovation. Electronic noses and tongues, which mimic human smell and taste perception, have great potential in fields such as environmental monitoring, food safety, medical diagnosis and quality control. However, the traditional electronic nose and tongue systems still face challenges due to the complexity of marker morphology and the limitations of sensors. Machine learning technology has also proven to be a valuable tool for improving traditional electronic nose and tongue technology. In this review, we introduce the principle and design of machine learning combined with electronic nose and tongue technology, analyze some recently published articles on machine learning-assisted electronic nose and tongue for marker detection, and review the practical application of machine learning technology in the field of electronic nose and tongue. It is believed that through continuous exploration and innovation, machine learning will promote the application of electronic nose and electronic tongue, realize intelligent monitoring and control, and provide help for human life and health.

  • Research Article
  • 10.1038/s41598-025-33610-6
Development of a multimodal magnetic resonance imaging-based machine learning prediction model for flight cadets
  • Jan 5, 2026
  • Scientific Reports
  • Lu Ye + 4 more

In the realm of civil aviation, the existing methods for selecting and training flight cadets have limitations, such as long evaluation cycles and susceptibility to subjective factors. This study integrated multimodal magnetic resonance imaging (MRI) data, including structural MRI (sMRI), diffusion tensor imaging (DTI), and functional MRI (fMRI), with machine learning techniques. The aim was to construct prediction models capable of accurately differentiating flight cadets from ground cadets. Data were collected from 39 flight cadets with extensive flight training and 37 ground cadets. Representative features were meticulously extracted from each modality and fused at the feature level. Four machine learning classification algorithms, namely logistic regression (LR), random forest support vector machine and Gaussian naive Bayes were employed for model construction. Rigorous five-fold cross-validation and permutation tests were conducted to ensure model reliability. The results revealed that the multimodal fusion model (sMRI + DTI + fMRI + LR) exhibited the optimal performance, achieving an accuracy of 0.838, an area under the receiver operating characteristic curve (AUC) of 0.942, a sensitivity of 0.835, and a specificity of 0.834. Through SHapley Additive exPlanations analysis, features with high contributions to the classification were identified, which were closely associated with advanced cognitive functions, visual processing, and attention allocation. This research not only offers a novel approach for the selection and training evaluation of flight cadets but also demonstrates the potential of multimodal MRI-based machine learning models in exploring the neural mechanisms underlying flight-related skills.

  • Research Article
  • Cite Count Icon 10
  • 10.3390/bioengineering12050477
Advancements in Medical Radiology Through Multimodal Machine Learning: A Comprehensive Overview.
  • Apr 30, 2025
  • Bioengineering (Basel, Switzerland)
  • Imran Ul Haq + 5 more

The majority of data collected and obtained from various sources over a patient's lifetime can be assumed to comprise pertinent information for delivering the best possible treatment. Medical data, such as radiographic and histopathology images, electrocardiograms, and medical records, all guide a physician's diagnostic approach. Nevertheless, most machine learning techniques in the healthcare field emphasize data analysis from a single modality, which is insufficiently reliable. This is especially evident in radiology, which has long been an essential topic of machine learning in healthcare because of its high data density, availability, and interpretation capability. In the future, computer-assisted diagnostic systems must be intelligent to process a variety of data simultaneously, similar to how doctors examine various resources while diagnosing patients. By extracting novel characteristics from diverse medical data sources, advanced identification techniques known as multimodal learning may be applied, enabling algorithms to analyze data from various sources and eliminating the need to train each modality. This approach enhances the flexibility of algorithms by incorporating diverse data. A growing quantity of current research has focused on the exploration of extracting data from multiple sources and constructing precise multimodal machine/deep learning models for medical examinations. A comprehensive analysis and synthesis of recent publications focusing on multimodal machine learning in detecting diseases is provided. Potential future research directions are also identified. This review presents an overview of multimodal machine learning (MMML) in radiology, a field at the cutting edge of integrating artificial intelligence into medical imaging. As radiological practices continue to evolve, the combination of various imaging and non-imaging data modalities is gaining increasing significance. This paper analyzes current methodologies, applications, and trends in MMML while outlining challenges and predicting upcoming research directions. Beginning with an overview of the different data modalities involved in radiology, namely, imaging, text, and structured medical data, this review explains the processes of modality fusion, representation learning, and modality translation, showing how they boost diagnosis efficacy and improve patient care. Additionally, this review discusses key datasets that have been instrumental in advancing MMML research. This review may help clinicians and researchers comprehend the spatial distribution of the field, outline the current level of advancement, and identify areas of research that need to be explored regarding MMML in radiology.

  • Research Article
  • 10.1109/tcsvt.2022.3160751
Guest Editorial Special Section on Learning With Multimodal Data for Biomedical Informatics
  • May 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Zhangyang Wang + 5 more

In this Special Section of the IEEE Transactions on Circuits and Systems for Video Technology, it is our honor to present emerging advanced machine learning and data analytics algorithms aiming at catalyzing synergies among image/video processing, text/speech understanding, and multimodal learning in biomedical informatics. Our goals are to 1) introduce novel data-driven models to accelerate knowledge discovery in biomedicine through the seamless integration of medical data collected from imaging systems, laboratory and wearable devices, as well as other related medical devices; 2) promote the development of new multi-modal learning systems to enhance the healthcare quality and patient safety; and 3) promote new applications in biomedical informatics that can leverage or benefits from the integration of multi-modal data and machine learning.

  • Research Article
  • Cite Count Icon 2
  • 10.1101/2025.08.07.25333235
A Systematic Review of Multimodal Deep Learning and Machine Learning Fusion Techniques for Prostate Cancer Classification.
  • Aug 11, 2025
  • medRxiv : the preprint server for health sciences
  • Farhana Manzoor + 6 more

Prostate cancer remains one of the most prevalent malignancies and a leading cause of cancer-related deaths among men worldwide. Despite advances in traditional diagnostic methods such as Prostate-specific antigen testing, digital rectal examination, and multiparametric Magnetic resonance imaging, these approaches remain constrained by modality-specific limitations, suboptimal sensitivity and specificity, and reliance on expert interpretation, which may introduce diagnostic inconsistency. Multimodal deep learning and machine learning fusion, which integrates diverse data sources including imaging, clinical, and molecular information, has emerged as a promising strategy to enhance the accuracy of prostate cancer classification. This review aims to outline the current state-of-the-art deep learning and machine learning based fusion techniques for prostate cancer classification, focusing on their implementation, performance, challenges, and clinical applicability. Following the PRISMA guidelines, a total of 131 studies were identified, of which 27 met the inclusion criteria for studies published between 2021 and 2025. Extracted data included input techniques, deep learning architectures, performance metrics, and validation approaches. The majority of the studies used an early fusion approach with convolutional neural networks to integrate the data. Clinical and imaging data were the most commonly used modalities in the reviewed studies for prostate cancer research. Overall, multimodal deep learning and machine learning-based fusion significantly advances prostate cancer classification and outperform unimodal approaches.

  • Research Article
  • 10.3389/fonc.2025.1558880
Integrating multimodal ultrasound imaging and machine learning for predicting luminal and non-luminal breast cancer subtypes
  • Oct 8, 2025
  • Frontiers in Oncology
  • Yan Fu + 13 more

Rationale and ObjectivesBreast cancer molecular subtypes significantly influence treatment outcomes and prognoses, necessitating precise differentiation to tailor individualized therapies. This study leverages multimodal ultrasound imaging combined with machine learning to preoperatively classify luminal and non-luminal subtypes, aiming to enhance diagnostic accuracy and clinical decision-making.MethodsThis retrospective study included 247 patients with breast cancer, with 192 meeting the inclusion criteria. Patients were randomly divided into a training set (134 cases) and a validation set (58 cases) in a 7:3 ratio. Image segmentation was conducted using 3D Slicer software, adhering to IBSI-standardized radiomics feature extraction. We constructed four model configurations—monomodal, dual-modal, trimodal, and four-modal—through optimized feature selection. These included monomodal datasets comprising 2D ultrasound (US) images, dual-modal datasets integrating 2D US with color Doppler flow imaging (CDFI) (US+CDFI), trimodal datasets incorporating strain elastography (SE) alongside 2D US and CDFI (US+CDFI+SE), and four-modal datasets combining all modalities, including ABVS coronal imaging (US+CDFI+SE+ABVS). Machine learning classifiers such as logistic regression (LR), support vector machines (SVM), AdaBoost (adaptive boosting), random forests(RF), linear discriminant analysis(LDA), and ridge regression were utilized.ResultsThe four-modal model achieved the highest performance (AUC: 0.947, 95% CI: 0.884-0.986), significantly outperforming the monomodal model (AUC 0.758, ΔAUC +0.189). Multimodal integration progressively enhanced performance: trimodal models surpassed dual-modal and monomodal approaches (AUC 0.865 vs 0.741 and 0.758), and the four-modal framework showed marked improvements in sensitivity (88.4% vs 71.1% for monomodal), specificity (92.7% vs 70.1%), and F1 scores (0.905).ConclusionThis study establishes a multimodal machine learning model integrating advanced ultrasound imaging techniques to preoperatively distinguish luminal from non-luminal breast cancers. The model demonstrates significant potential to improve diagnostic accuracy and generalization, representing a notable advancement in non-invasive breast cancer diagnostics.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant