Image-Text Joint Learning for Social Images with Spatial Relation Model

Jiangfan Feng,Yao Zhou,Xuejun Fu,Yuling Zhu,Xiaobo Luo

doi:10.1155/2020/1543947

Abstract

The rapid developments in sensor technology and mobile devices bring a flourish of social images, and large-scale social images have attracted increasing attention to researchers. Existing approaches generally rely on recognizing object instances individually with geo-tags, visual patterns, etc. However, the social image represents a web of interconnected relations; these relations between entities carry semantic meaning and help a viewer differentiate between instances of a substance. This article forms the perspective of the spatial relationship to exploring the joint learning of social images. Precisely, the model consists of three parts: (a) a module for deep semantic understanding of images based on residual network (ResNet); (b) a deep semantic analysis module of text beyond traditional word bag methods; (c) a joint reasoning module from which the text weights obtained using image features on self-attention and a novel tree-based clustering algorithm. The experimental results demonstrate the effectiveness of using Flickr30k and Microsoft COCO datasets. Meanwhile, our method considers spatial relations while matching.

Highlights

With the rise of cheap sensors, mobile terminals, and social networks, research on social images is making good progress, including image retrieval, object classification, and scene understanding
We aim at developing a method to learn the spatial relations across separate visual objects and texts for social image understanding. erefore, this paper proposes a cross-modal framework, which builds a joint model of texts and images to extract features and combine the advantages of self-attention mechanism and deep learning models, generating interactive effects
We focus on two image-text tasks: spatial relation modeling and image-text matching. e former refers both to image-to-image and image-to-text, and definitions of the two scenarios are straightforward: given an input image, the goal is to find the relationships between entities with semantic meaning. e second task refers to find the best matching sentences to the input images

Summary

Introduction

With the rise of cheap sensors, mobile terminals, and social networks, research on social images is making good progress, including image retrieval, object classification, and scene understanding. Wang et al [5] present an algorithm to learn the relations between scenes, objects, and texts with the help of image-level labels. Such a training process requires a large number of paired images and text data. The spatial relationships from the textual descriptions are very scarce in reality Motivated by these observations, we aim at developing a method to learn the spatial relations across separate visual objects and texts for social image understanding. E proposed methods usually require additional annotations of relations, while they demand only image-level annotations

Cross-Modal Reasoning Framework

Similarity Network

Cross-Modal Matching

Experiments and Results

Comparison with the State-of-the-Art

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image-Text Joint Learning for Social Images with Spatial Relation Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity

Lead the way for us

Journal: Complexity	Publication Date: Mar 28, 2020
License type: CC BY 4.0

Similar Papers

Mining Large-Scale Social Images with Rich Metadata and Its Application
...
Journal of Software | VOL. -
, et. al. ...
04 Jan 2012
Journal of Software | VOL. -

A Weighted Topic Model Learned From Local Semantic Space for Automatic Image Annotation
Haiyu Song ... Gang Wu
IEEE Access | VOL. 8
Haiyu Song, et. al.Haiyu Song ... Gang Wu
01 Jan 2020
IEEE Access | VOL. 8

Personalized Recommendation of Social Images by Constructing a User Interest Tree With Deep Features and Tag Trees
Jing Zhang ... Ying Yang
IEEE Transactions on Multimedia | VOL. 21
Jing Zhang, et. al.Jing Zhang ... Ying Yang
01 Nov 2019
IEEE Transactions on Multimedia | VOL. 21

Automatic Abstract Tag Detection for Social Image Tag Refinement and Enrichment
Zhaoqiang Xia ... Jianping Fan
Journal of Signal Processing Systems | VOL. 74
Zhaoqiang Xia, et. al.Zhaoqiang Xia ... Jianping Fan
18 May 2013
Journal of Signal Processing Systems | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image-Text Joint Learning for Social Images with Spatial Relation Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity