Two-Stage Spatial Mapping for Multimodal Data Fusion in Mobile Crowd Sensing

Jiancun Zhou,Sheng Ren,Kehua Guo,Tao Xu

doi:10.1109/access.2020.2995268

Jiancun Zhou, Sheng Ren + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2995268

Copy DOI

Abstract

Human-driven Edge Computing (HEC) integrates the elements of humans, devices, Internet and information, and mobile crowd sensing become an important means of data collection. In HEC, the data collected from large-scale sensing usually includes a variety of modalities. These different modality data contain unique information and attributes, which can be complementary. Combining data from many different modalities will get more information. However, current deep learning is usually only for bimodal data. In order for artificial intelligence to make further breakthroughs in understanding our real world, it needs to be able to process data in different modalities together. The key step is to be able to map these different modalities data into the same space. In order to process multimodal data better, we propose a fusion and classification method for multimodal data. First, a multimodal data space is constructed, and data of different modalities are mapped into the multimodal data space to obtain a unified representation of different modalities data. Then, through bilinear pooling, the representations of different modality are fused, and the fused vectors are used in the classification task. Through the experimental verification on the multi-modal data set, it proves that the multi-modal fusion representation is effective, and the classification effect is more accurate than the single-modal data.

Highlights

The rapid development of communication technology realizes the information collection and dissemination from various mobile crowd sensing (MCS) services in Human-driven Edge Computing (HEC) environment [1]
Based on the fusion representation, it can be used for the classification of multimodal data, and at the same time, the entire network is back-propagated according to the label, so that the label information is incorporated
To solve the above problem, we propose a method of multimodal data representation, which can map different modality data to the same potential space and retain their original semantics and corresponding relations

Summary

INTRODUCTION

The rapid development of communication technology realizes the information collection and dissemination from various mobile crowd sensing (MCS) services in Human-driven Edge Computing (HEC) environment [1]. Based on the fusion representation, it can be used for the classification of multimodal data, and at the same time, the entire network is back-propagated according to the label, so that the label information is incorporated Through this twostage method, on the one hand, the matching module can cope with the gap between different modal data and map it into a unified space; on the other hand, the labels can supplement some information, making the sneaking into the space more robust. The innovations of this paper are as follows: 1) A latent space is proposed, which can map data of various modalities to the space They can retain the previous correlation as much as possible, and the representation of the same objects in the space will be similar.

RELATED WORK

MULTIMODAL FUSION AND CLASSIFICATION METHODS

FUSION AND CLASSIFICATION PROCESS

Findings

EXPERIMENTS