Abstract

Recently, the multimodal information is taken into consideration for ground-based cloud classification in weather station networks, but intrinsic correlations between the multimodal information and the visual information cannot be mined sufficiently. We propose a novel approach called hierarchical multimodal fusion (HMF) for ground-based cloud classification in weather station networks, which fuses the deep multimodal features and the deep visual features in different levels, i.e., low-level fusion and high-level fusion. The low-level fusion directly fuses the heterogeneous features, which focuses on the modality-specific fusion. The high-level fusion integrates the output of low-level fusion with deep visual features and deep multimodal features, which could learn complex correlations among them owing to the deep fusion structure. We employ one loss function to train the overall framework of the HMF so as to improve the discrimination of cloud representations. The experimental results on the MGCD dataset indicate that our method outperforms other methods, which verifies the effectiveness of the HMF in ground-based cloud classification.

Highlights

  • Nowadays, the ground-based cloud classification in weather station networks [1], [2] plays a critical role in many fields

  • We employ one loss function to train the overall framework of hierarchical multimodal fusion (HMF) so as to further improve the discrimination of cloud representations

  • In order to further mine the correlations between deep visual features and deep multimodal features, we propose the high-level fusion strategy which is implemented by fusing the output of low-level fusion and the heterogeneous features

Read more

Summary

INTRODUCTION

The ground-based cloud classification in weather station networks [1], [2] plays a critical role in many fields. Since the ground-based cloud image and the multimodal information possess different forms, the visual subnetwork and multimodal subnetwork are designed as a CNN model and a Multi-Layer Perceptron (MLP) so as to transform the heterogeneous inputs into the same dimensional feature vectors. In order to learn complex correlations between the visual information and the multimodal information, HMF proposes a hierarchical fusion strategy including low-level fusion and high-level fusion. The proposed HMF fuses the heterogeneous features in different levels, which could mine intrinsic correlations between the visual information and the multimodal information.

RELATED WORK
LOSS FUNCTION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call