Abstract

A multimodal data fusion optimization and task recommendation method is proposed in mobile crowd sensing for the complex data redundancy generated by sensing users, which leads to low task recommendation accuracy and high sensing cost. The method extracts the modal features using BERT and Faster-RCNN. It obtains self-attention and cross-guided features by the attention mechanism to achieve intra-modal and inter-modal information sharing and reduce the risk of fusing unrelated modal features. These obtained modal features are then used for hierarchical fusion. Extracting fusion features of different granularity by capturing implicit features within a single modality and complementary features between multiple modalities, then jointly optimizing them, thus making the fusion results more focused on information of interest to users in multimodal historical task data. The cross-guided self-attention mechanism is designed to improve the accuracy of multimodal data fusion by fully fusing modal data and the joint optimization of their different fusion features. Thus it increases the sensing user's interest in completing the task, which in turn increases the motivation of the sensing user to participate in the sensing task and improves the sensing quality. Finally, the similarity between the new task and the historical task is calculated to decide whether to recommend the new task to the sensing user or not. Experiments based on the Flicker8k and Pascal Sentence datasets show that our proposed method can effectively fuse multimodal data and improve the accuracy of task recommendations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call