Video Visual Relation Detection via 3D Convolutional Neural Network

Mingcheng Qu,Ganlin Deng,Wenkai Shao,Jianxun Cui,Tonghua Su

doi:10.1109/access.2022.3154423

Abstract

Video visual relation detection, which aims to detect the visual relations between objects in the form of relation triplet <subject, predicate, object> (e.g., “person-ride-bike”, “dog-toward-car”, etc.), is a significant and fundamental task in computer vision. However, most of the existing works about visual relation instances are focused on static images. Modeling the non-static relationships in videos has drawn little attention due to lacking large-scale video dataset support. In our work, we propose a video dataset named Video Predicate Detection and Reasoning (VidPDR) for dynamic video visual relation detection, which consists of 1,000 videos with dense manually dynamic labeled annotations on 21 object classes and 37 predicates classes. Moreover, we propose a novel spatio-temporal feature extraction framework with 3D Convolutional Neural Networks (ST3DCNN), which includes three modules 1) object trajectory, 2) short-term relation prediction, and 3) greedy relational association. We conducted appropriate experiments on public datasets and our own dataset (VidPDR). Results demonstrate that our proposed method has a great improvement in comparison to the state-of-the-art baselines.

Highlights

As a bridge between dynamic object detection and textual information in video information retrieval, traditional video visual relation detection aims to explore the non-static interaction knowledge among co-occurrent objects in the video
To evaluate the effectiveness of the features extracted by the 3D Convolutional Neural Network, we compared our method with several video visual relation detection models
ST3DCNN which proves that the features extracted by 3D Convolutional Neural Network perform better than some manual encoded features in the video visual relation detection achieved best performance on most of evaluation metrics

Summary

INTRODUCTION

As a bridge between dynamic object detection and textual information in video information retrieval, traditional video visual relation detection aims to explore the non-static interaction knowledge among co-occurrent objects in the video. This method utilized a fully connected graph structure and an energy function with trainable hyperparameters to model the spatial-temporal of object relations in the given video and achieved great performance. These features which are encoded through the last fully connected layer of the network can achieve satisfactory performance in image detection Such imagebased representations which lack the temporal information between adjacent frames in videos are not suitable for the video-based problem. The theoretical support is that the simple visual relationship can be extracted quickly and effectively in short videos, and the complex visual relationship can always be inferred from the simple relationship On this basis, we propose the object trajectory detection module and a relation predict module.

OBJECT TRAJECTORY DETECTION

Method

RELATION PREDICTION

EXPERIMENT

DATASETS

EVALUATION METRICS

COMPARED METHODS

RESULT

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Video Visual Relation Detection via 3D Convolutional Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

DAP3D-Net: Where, what and how actions occur in videos?
Li Liu ... Ling Shao
-
Li Liu, et. al.Li Liu ... Ling Shao
01 May 2017
01 May 2017

A Depthwise Separable Network for Action Recognition
Shengwei Zhou ... Liang Bai
DEStech Transactions on Computer Science and Engineering | VOL. -
Shengwei Zhou, et. al.Shengwei Zhou ... Liang Bai
09 Dec 2019
DEStech Transactions on Computer Science and Engineering | VOL. -

Neuromorphic Vision Datasets for Pedestrian Detection, Action Recognition, and Fall Detection.
Shu Miao ... Alois Knoll
Frontiers in neurorobotics | VOL. 13
Shu Miao, et. al.Shu Miao ... Alois Knoll
18 Jun 2019
Frontiers in neurorobotics | VOL. 13

A novel video dataset for change detection benchmarking.
Nil Goyette ... Pierre-Marc Jodoin
IEEE Transactions on Image Processing | VOL. 23
Nil Goyette, et. al.Nil Goyette ... Pierre-Marc Jodoin
07 Aug 2014
IEEE Transactions on Image Processing | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video Visual Relation Detection via 3D Convolutional Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions