Textual Modalities Research Articles

ABSTRACT With the increasing frequency of floods, in-depth flood event analyses are essential for effective disaster relief and prevention. Satellite-based flood event datasets have become the primary data source for flood event analyses instead of limited disaster maps due to their enhanced availability. Nevertheless, despite the vast amount of available remote sensing images, existing flood event datasets continue to pose significant challenges in flood event analyses due to the uneven geographical distribution of data, the scarcity of time series data, and the limited availability of flood-related semantic information. There has been a surge in acceptance of deep learning models for flood event analyses, but some existing flood datasets do not align well with model training, and distinguishing flooded areas has proven difficult with limited data modalities and semantic information. Moreover, efficient retrieval and pre-screening of flood-related imagery from vast satellite data impose notable obstacles, particularly within large-scale analyses. To address these issues, we propose a Multimodal Flood Event Dataset (MFED) for deep-learning-based flood event analyses and data retrieval. It consists of 18 years of multi-source remote sensing imagery and heterogeneous textual information covering flood-prone areas worldwide. Incorporating optical and radar imagery can exploit the correlation and complementarity between distinct image modalities to capture the pixel features in flood imagery. It is worth noting that text modality data, including auxiliary hydrological information extracted from the Global Flood Database and text information refined from online news records, can also offer a semantic supplement to the images for flood event retrieval and analysis. To verify the applicability of the MFED in deep learning models, we carried out experiments with different models using a single modality and different combinations of modalities, which fully verified the effectiveness of the dataset. Furthermore, we also verify the efficiency of the MFED in comparative experiments with existing multimodal datasets and diverse neural network structures.

Read full abstract

Emotion detection holds significant importance in facilitating human–computer interaction, enhancing the depth of engagement. By integrating this capability, we pave the way for forthcoming AI technologies to possess a blend of cognitive and emotional understanding, bridging the divide between machine functionality and human emotional complexity. This progress has the potential to reshape how machines perceive and respond to human emotions, ushering in an era of empathetic and intuitive artificial systems. The primary research challenge involves developing models that can accurately interpret and analyze emotions from both auditory and textual data, whereby auditory data require optimizing CNNs to detect subtle and intense emotional fluctuations in speech, and textual data necessitate access to large, diverse datasets to effectively capture nuanced emotional cues in written language. This paper introduces a novel approach to multimodal emotion recognition, seamlessly integrating speech and text modalities to accurately infer emotional states. Employing CNNs, we meticulously analyze speech using Mel spectrograms, while a BERT-based model processes the textual component, leveraging its bidirectional layers to enable profound semantic comprehension. The outputs from both modalities are combined using an attention-based fusion mechanism that optimally weighs their contributions. The proposed method here undergoes meticulous testing on two distinct datasets: Carnegie Mellon University’s Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset and the Multimodal Emotion Lines Dataset (MELD). The results demonstrate superior efficacy compared to existing frameworks, achieving an accuracy of 88.4% and an F1-score of 87.9% on the CMU-MOSEI dataset, and a notable weighted accuracy (WA) of 67.81% and a weighted F1 (WF1) score of 66.32% on the MELD dataset. This comprehensive system offers precise emotion detection and introduces several significant advancements in the field.

Read full abstract

Textual Modalities Research Articles

Related Topics

Articles published on Textual Modalities

A global multimodal flood event dataset with heterogeneous text and multi-source remote sensing images

Vision-language constraint graph representation learning for unsupervised vehicle re-identification

The contribution of prosody to machine classification of schizophrenia

Seeing and Thinking about Urban Blue–Green Space: Monitoring Public Landscape Preferences Using Bimodal Data

Enhancing Multimodal Emotion Recognition through Attention Mechanisms in BERT and CNN Architectures

SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion

Similar modality completion-based multimodal sentiment analysis under uncertain missing modalities

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

Sketch2Prototype: rapid conceptual design exploration and prototyping with generative AI

PROMISE: A pre-trained knowledge-infused multimodal representation learning framework for medication recommendation

World futures through RT’s eyes: multimodal dataset and interdisciplinary methodology

Audio-textual multi-label demographic recognition of Arabic speakers using deep learning

Category of learning modality in English texts

EmoStyle: Emotion-Aware Semantic Image Manipulation with Audio Guidance

What do they “meme”? A metaphor-aware multi-modal multi-task framework for fine-grained meme understanding

Text-based person search via cross-modal alignment learning

A Systematic Review of Big Data Driven Education Evaluation

How Does the High Court Interpret the Constitution? A Qualitative Analysis between 2019–21

Attr4Vis: Revisiting Importance of Attribute Classification in Vision-Language Models for Video Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Textual Modalities Research Articles

Related Topics

Articles published on Textual Modalities

A global multimodal flood event dataset with heterogeneous text and multi-source remote sensing images

Vision-language constraint graph representation learning for unsupervised vehicle re-identification

The contribution of prosody to machine classification of schizophrenia

Seeing and Thinking about Urban Blue–Green Space: Monitoring Public Landscape Preferences Using Bimodal Data

Enhancing Multimodal Emotion Recognition through Attention Mechanisms in BERT and CNN Architectures

SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion

Similar modality completion-based multimodal sentiment analysis under uncertain missing modalities

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

Sketch2Prototype: rapid conceptual design exploration and prototyping with generative AI

PROMISE: A pre-trained knowledge-infused multimodal representation learning framework for medication recommendation

World futures through RT’s eyes: multimodal dataset and interdisciplinary methodology

Audio-textual multi-label demographic recognition of Arabic speakers using deep learning

Category of learning modality in English texts

EmoStyle: Emotion-Aware Semantic Image Manipulation with Audio Guidance

What do they “meme”? A metaphor-aware multi-modal multi-task framework for fine-grained meme understanding

Text-based person search via cross-modal alignment learning

A Systematic Review of Big Data Driven Education Evaluation

How Does the High Court Interpret the Constitution? A Qualitative Analysis between 2019–21

Attr4Vis: Revisiting Importance of Attribute Classification in Vision-Language Models for Video Recognition