Multimodal Interaction Research Articles

As technology advances, more research dedicated to medical interactive systems emphasizes the integration of touchless and multimodal interaction (MMI). Particularly in surgical and interventional settings, this approach is advantageous because it maintains sterility and promotes a natural interaction. Past reviews have focused on investigating MMI in terms of technology and interaction with robots. However, none has put particular emphasis on analyzing these kind of interactions for surgical and interventional scenarios. Two databases were included in the query to search for relevant publications within the past 10 years. After identification, two screening steps followed which included eligibility criteria. A forward/backward search was added to identify more relevant publications. The analysis incorporated the clustering of references in terms of addressed medical field, input and output modalities, and challenges regarding the development and evaluation. A sample of 31 references was obtained (16 journal articles, 15 conference papers). MMI was predominantly developed for laparoscopy and radiology and interaction with image viewers. The majority implemented two input modalities, with voice-hand interaction being the most common combination-voice for discrete and hand for continuous navigation tasks. The application of gaze, body, and facial control is minimal, primarily because of ergonomic concerns. Feedback was included in 81% publications, of which visual cues were most often applied. This work systematically reviews MMI for surgical and interventional scenarios over the past decade. In future research endeavors, we propose an enhanced focus on conducting in-depth analyses of the considered use cases and the application of standardized evaluation methods. Moreover, insights from various sectors, including but not limited to the gaming sector, should be exploited.

Read full abstract

Large language models (LLMs) are pivotal in artificial intelligence, demonstrating advanced capabilities in natural language understanding and multimodal interactions, with significant potential in medical applications. This study explores the feasibility and efficacy of LLMs, specifically ChatGPT-4o and Claude 3-Opus, in classifying thyroid nodules using ultrasound images. This study included 112 patients with a total of 116 thyroid nodules, comprising 75 benign and 41 malignant cases. Ultrasound images of these nodules were analyzed using ChatGPT-4o and Claude 3-Opus to diagnose the benign or malignant nature of the nodules. An independent evaluation by a junior radiologist was also conducted. Diagnostic performance was assessed using Cohen's Kappa and receiver operating characteristic (ROC) curve analysis, referencing pathological diagnoses. ChatGPT-4o demonstrated poor agreement with pathological results (Kappa = 0.116), while Claude 3-Opus showed even lower agreement (Kappa = 0.034). The junior radiologist exhibited moderate agreement (Kappa = 0.450). ChatGPT-4o achieved an area under the ROC curve (AUC) of 57.0% (95% CI: 48.6-65.5%), slightly outperforming Claude 3-Opus (AUC of 52.0%, 95% CI: 43.2-60.9%). In contrast, the junior radiologist achieved a significantly higher AUC of 72.4% (95% CI: 63.7-81.1%). The unnecessary biopsy rates were 41.4% for ChatGPT-4o, 43.1% for Claude 3-Opus, and 12.1% for the junior radiologist. While LLMs such as ChatGPT-4o and Claude 3-Opus show promise for future applications in medical imaging, their current use in clinical diagnostics should be approached cautiously due to their limited accuracy.

Read full abstract

Multimodal Interaction Research Articles

Related Topics

Articles published on Multimodal Interaction

Multimodal fusion-powered English speaking robot

Considering Ornaments, Necklaces and Floors: Negotiations of Everyday Family Activities Through the ‘Collective Action’ of Multiple Actors

Enhancing Human-Computer Interaction in Augmented Reality (AR) and Virtual Reality (VR) Environments: The Role of Adaptive Interfaces and Haptic Feedback Systems

High-Order Multimodal Interaction Network for Efficient Prediction of Drug-Drug Interaction

A Multilingual Preschooler’s School Belonging: The Role of Translanguaging Pedagogy

Affording Social Experience for Adolescents Using Immersive Virtual Reality: A Moderated Mediation Analysis

Audiovisual Imagery in Multimodal Perception and Performance of Qigang Chen’s Er Huang Concerto

Multimodal Locomotion and Dynamic Interaction of Hydrogel Microdisks at the Air-Water Interface under Magnetic and Light Stimuli.

Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering

Multimodal Co-Attention Fusion Network With Online Data Augmentation for Cancer Subtype Classification.

Multimodal human-computer interaction in interventional radiology and surgery: a systematic literature review.

A Multi-Hierarchical Complementary Feature Interaction Network for Accelerated Multi-Modal MR Imaging

Diverse Role of Buffer Mediums and Protein Concentrations to Mediate the Multimodal Interaction of Phenylalanine-Functionalized Gold Nanoparticle and Lysozyme Protein at Same Nominal pH.

Multimodal Academic Discourse Socialization: Examining Geoscience Students’ Disciplinary Knowledge Construction and Socialization at a Canadian University

Multimodal human computer interaction of wheelchairs supporting lower limb active rehabilitation

Social Media Sentiment Analysis

Reconstructing representations using diffusion models for multimodal sentiment analysis through reading comprehension

A Parent’s Multimodal Support to Bilingual Child’s Reading: A Case Study of Korean Immigrant Family

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.

Teasing via the [lo, ki ‘no, because’ + ironic utterance] structure in Hebrew talk-in-interaction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multimodal Interaction Research Articles

Related Topics

Articles published on Multimodal Interaction

Multimodal fusion-powered English speaking robot

Considering Ornaments, Necklaces and Floors: Negotiations of Everyday Family Activities Through the ‘Collective Action’ of Multiple Actors

Enhancing Human-Computer Interaction in Augmented Reality (AR) and Virtual Reality (VR) Environments: The Role of Adaptive Interfaces and Haptic Feedback Systems

High-Order Multimodal Interaction Network for Efficient Prediction of Drug-Drug Interaction

A Multilingual Preschooler’s School Belonging: The Role of Translanguaging Pedagogy

Affording Social Experience for Adolescents Using Immersive Virtual Reality: A Moderated Mediation Analysis

Audiovisual Imagery in Multimodal Perception and Performance of Qigang Chen’s Er Huang Concerto

Multimodal Locomotion and Dynamic Interaction of Hydrogel Microdisks at the Air-Water Interface under Magnetic and Light Stimuli.

Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering

Multimodal Co-Attention Fusion Network With Online Data Augmentation for Cancer Subtype Classification.

Multimodal human-computer interaction in interventional radiology and surgery: a systematic literature review.

A Multi-Hierarchical Complementary Feature Interaction Network for Accelerated Multi-Modal MR Imaging

Diverse Role of Buffer Mediums and Protein Concentrations to Mediate the Multimodal Interaction of Phenylalanine-Functionalized Gold Nanoparticle and Lysozyme Protein at Same Nominal pH.

Multimodal Academic Discourse Socialization: Examining Geoscience Students’ Disciplinary Knowledge Construction and Socialization at a Canadian University

Multimodal human computer interaction of wheelchairs supporting lower limb active rehabilitation

Social Media Sentiment Analysis

Reconstructing representations using diffusion models for multimodal sentiment analysis through reading comprehension

A Parent’s Multimodal Support to Bilingual Child’s Reading: A Case Study of Korean Immigrant Family

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.

Teasing via the [lo, ki ‘no, because’ + ironic utterance] structure in Hebrew talk-in-interaction