Multimodal Interaction Research Articles

Frequency distributions are known to widely affect psycholinguistic processes. The effects of word frequency in turns-at-talk, the nucleus of social action in conversation, have, by contrast, been largely neglected. This study probes into this gap by applying corpus-linguistic methods on the conversational component of the British National Corpus (BNC) and the Freiburg Multimodal Interaction Corpus (FreMIC). The latter includes continuous pupil size measures of participants of the recorded conversations, allowing for a systematic investigation of patterns in the contained speech and language on the one hand and their relation to concurrent processing costs they may incur in speakers and recipients on the other hand. We test a first hypothesis in this vein, analyzing whether word frequency distributions within turns-at-talk are correlated with interlocutors' processing effort during the production and reception of these turns. Turns are found to generally show a regular distribution pattern of word frequency, with highly frequent words in turn-initial positions, mid-range frequency words in turn-medial positions, and low-frequency words in turn-final positions. Speakers' pupil size is found to tend to increase during the course of a turn at talk, reaching a climax toward the turn end. Notably, the observed decrease in word frequency within turns is inversely correlated with the observed increase in pupil size in speakers, but not in recipients, with steeper decreases in word frequency going along with steeper increases in pupil size in speakers. We discuss the implications of these findings for theories of speech processing, turn structure, and information packaging. Crucially, we propose that the intensification of processing effort in speakers during a turn at talk is owed to an informational climax, which entails a progression from high-frequency, low-information words through intermediate levels to low-frequency, high-information words. At least in English conversation, interlocutors seem to make use of this pattern as one way to achieve efficiency in conversational interaction, creating a regularly recurring distribution of processing load across speaking turns, which aids smooth turn transitions, content prediction, and effective information transfer.

Read full abstract

BackgroundRecent enhancements in Large Language Models (LLMs) such as ChatGPT have exponentially increased user adoption. These models are accessible on mobile devices and support multimodal interactions, including conversations, code generation, and patient image uploads, broadening their utility in providing healthcare professionals with real-time support for clinical decision-making. Nevertheless, many authors have highlighted serious risks that may arise from the adoption of LLMs, principally related to safety and alignment with ethical guidelines. ObjectiveTo address these challenges, we introduce a novel methodological approach designed to assess the specific feasibility of adopting LLMs within a healthcare area, with a focus on clinical nursing, evaluating their performance and thereby directing their choice. Emphasizing LLMs’ adherence to scientific advancements, this approach prioritizes safety and care personalization, according to the “Organization for Economic Co-operation and Development” frameworks for responsible AI. Moreover, its dynamic nature is designed to adapt to future evolutions of LLMs. MethodThrough integrating advanced multidisciplinary knowledge, including Nursing Informatics, and aided by a prospective literature review, seven key domains and specific evaluation items were identified as follows:1.State of the Art Alignment & Safety.2.Focus, Accuracy & Management of Prompt Ambiguity.3.Data Integrity, Data Security, Ethics & Sustainability, in accordance with OECD Recommendations for Responsible AI.4.Temporal Variability of Responses (Consistency)5.Adaptation to specific standardized terminology and Classifications for healthcare professionals.6.General Capabilities: Post User Feedback Self-Evolution Capability and Organization in Chapters.7.Ability to Drive Evolution in Healthcare.A Peer Review by experts in Nursing and AI was performed, ensuring scientific rigor and breadth of insights for an essential, reproducible, and coherent methodological approach. By means of a 7-point Likert scale, thresholds are defined in order to classify LLMs as “unusable”, “usable with high caution”, and “recommended” categories.Nine state of the art LLMs were evaluated using this methodology in clinical oncology nursing decision-making, producing preliminary results. Gemini Advanced, Anthropic Claude 3 and ChatGPT 4 achieved the minimum score of the State of the Art Alignment & Safety domain for classification as “recommended”, being also endorsed across all domains. LLAMA 3 70B and ChatGPT 3.5 were classified as “usable with high caution.” Others were classified as unusable in this domain. ConclusionThe identification of a recommended LLM for a specific healthcare area, combined with its critical, prudent, and integrative use, can support healthcare professionals in decision-making processes.

Read full abstract

Multimodal Interaction Research Articles

Related Topics

Articles published on Multimodal Interaction

Understanding Facilitator Interventions in the Swedish Service “Taltjänst”

Opening interspecies encounters – Greetings between humans and nonhuman animals

A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in Healthcare

ViRgilites: Multilevel Feedforward for Multimodal Interaction in VR

How assisted eating becomes a caring practice in institutional settings: Embodied gestures and stages of assisted eating

Virtual/augmented reality-based human–machine interface and interaction modes in airport control towers

Digital education through guided pretend play

Multi-modal interaction with token division strategy for RGB-T tracking

Word frequency and cognitive effort in turns-at-talk: turn structure affects processing load in natural conversation.

DEVELOPING MULTIMODAL LISTENING SKILLS IN ENGLISH FOR SPECIFIC (OR SPECIAL) PURPOSES: A PEDAGOGICAL FRAMEWORK

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Desktop Voice Assistant

Alzheimer’s disease diagnosis from multi-modal data via feature inductive learning and dual multilevel graph neural network

Integrating human expertise & automated methods for a dynamic and multi-parametric evaluation of large language models’ feasibility in clinical decision-making

Speech Cloning: Text-To-Speech Using VITS

Effects of User Interface Orientation on Sense of Immersion in Augmented Reality

Disclaiming knowledge to encourage participation in research group meetings

A new XR-based human‐robot collaboration assembly system based on industrial metaverse

Improving Error Correction and Text Editing Using Voice and Mouse Multimodal Interface

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multimodal Interaction Research Articles

Related Topics

Articles published on Multimodal Interaction

Understanding Facilitator Interventions in the Swedish Service “Taltjänst”

Opening interspecies encounters – Greetings between humans and nonhuman animals

A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in Healthcare

ViRgilites: Multilevel Feedforward for Multimodal Interaction in VR

How assisted eating becomes a caring practice in institutional settings: Embodied gestures and stages of assisted eating

Virtual/augmented reality-based human–machine interface and interaction modes in airport control towers

Digital education through guided pretend play

Multi-modal interaction with token division strategy for RGB-T tracking

Word frequency and cognitive effort in turns-at-talk: turn structure affects processing load in natural conversation.

DEVELOPING MULTIMODAL LISTENING SKILLS IN ENGLISH FOR SPECIFIC (OR SPECIAL) PURPOSES: A PEDAGOGICAL FRAMEWORK

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Desktop Voice Assistant

Alzheimer’s disease diagnosis from multi-modal data via feature inductive learning and dual multilevel graph neural network

Integrating human expertise & automated methods for a dynamic and multi-parametric evaluation of large language models’ feasibility in clinical decision-making

Speech Cloning: Text-To-Speech Using VITS

Effects of User Interface Orientation on Sense of Immersion in Augmented Reality

Disclaiming knowledge to encourage participation in research group meetings

A new XR-based human‐robot collaboration assembly system based on industrial metaverse

Improving Error Correction and Text Editing Using Voice and Mouse Multimodal Interface