Detecting human\u2014object interaction with multi-level pairwise feature network

Hanchao Liu,Xiaolei Huang,Tai-Jiang Mu

doi:10.1007/s41095-020-0188-2

Hanchao Liu, Xiaolei Huang + Show 1 more

Open Access

https://doi.org/10.1007/s41095-020-0188-2

Copy DOI

Abstract

Human–object interaction (HOI) detection is crucial for human-centric image understanding which aims to infer ⟨human, action, object⟩ triplets within an image. Recent studies often exploit visual features and the spatial configuration of a human–object pair in order to learn the action linking the human and object in the pair. We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at which a body part interacts with an object, and at the semantic level by considering the semantic label of an object along with human appearance and human–object spatial configuration, to infer the action. We thus propose a multi-level pairwise feature network (PFNet) for detecting human–object interactions. The network consists of three parallel streams to characterize HOI utilizing pairwise features at the above three levels; the three streams are finally fused to give the action prediction. Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the V-COCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.

Highlights

Deep learning has witnessed great progressManuscript received: 2020-06-26; accepted: 2020-07-20 in visual recognition [1] and object detection [2,3,4]
Current attempts to address the problem of human–object interaction (HOI) detection usually rely on considering all human, object pairs in an image, where the pairwise features comprise three components: visual features of the human, visual features of the object, and spatial configuration linking the human and object [7, 11]
Action, object triplets are relatively sparse among all triplets, some previous work [12, 15] further predict an interaction or affinity term to filter out human–object pairs that are not interacting

Summary

Introduction

Manuscript received: 2020-06-26; accepted: 2020-07-20 in visual recognition [1] and object detection [2,3,4]. Current attempts to address the problem of HOI detection usually rely on considering all human, object pairs in an image, where the pairwise features comprise three components: visual features of the human, visual features of the object, and spatial configuration linking the human and object [7, 11] These components help to recognize actions with a typical spatial interaction pattern, e.g., ride, or actions strongly correlated with the presence of a person or specific objects. PFNet aggregates pairwise visual and spatial features at three levels and incorporates both local body parts and semantic priors to achieve more robust and accurate HOI detection. A comparison with other methods conducted on two large-scale datasets, V-COCO [17] and HICO-DET [7], shows that our method achieves state-of-the-art performance on V-COCO and the best result on HICO-DET, without needing any extra annotation

Related work

Network architecture

Instance-level pairwise feature stream

Part-level pairwise feature stream

Semantic-level pairwise feature stream

Loss function

Datasets

Evaluation metrics

Implementation details

Comparison with state of the art

Method

Effect of each level of pairwise features

Components in part-level feature

Limitations

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Visual Media	Publication Date: Oct 19, 2020
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

Detecting human\u2014object interaction with multi-level pairwise feature network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Visual Media

Lead the way for us

Similar Papers

Pairwise CNN-Transformer Features for Human-Object Interaction Detection.
Hutuo Quan ... Jun Ma
Entropy | VOL. 26
Hutuo Quan, et. al.Hutuo Quan ... Jun Ma
27 Feb 2024
Entropy | VOL. 26

ICGPN: Interaction-centric graph parsing network for human-object interaction detection
Wenhao Yang ... Hongying Meng
Neurocomputing | VOL. 502
Wenhao Yang, et. al.Wenhao Yang ... Hongying Meng
30 Jun 2022
Neurocomputing | VOL. 502

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
Ye Liu ... Chang Wen Chen
-
Ye Liu, et. al.Ye Liu ... Chang Wen Chen
12 Oct 2020
12 Oct 2020

Effects of Motion-Relevant Knowledge From Unlabeled Video to Human-Object Interaction Detection.
Xue Lin ... Xixia Xu
IEEE transactions on neural networks and learning systems | VOL. 34
Xue Lin, et. al.Xue Lin ... Xixia Xu
01 Sep 2023
IEEE transactions on neural networks and learning systems | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting human\u2014object interaction with multi-level pairwise feature network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Visual Media