End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

Mingrui Wu,Yunhang Shen,Mingbao Lin,Jiaxin Gu,Xiaoshuai Sun,Chao Chen

doi:10.1609/aaai.v37i3.25385

Abstract

Most existing Human-Object Interaction (HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel End-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our method outperforms the previous SOTA under various zero-shot settings. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code is available at: https://github.com/mrwu-mac/EoID.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 10

Similar Papers

Human object interaction detection in paintings using multi-task learning
Maya Antoun ... Daniel Asmar
Digital Applications in Archaeology and Cultural Heritage | VOL. 34
Maya Antoun, et. al.Maya Antoun ... Daniel Asmar
24 Jul 2024
Digital Applications in Archaeology and Cultural Heritage | VOL. 34

Detecting Human-Object Interaction via Fabricated Compositional Learning
Zhi Hou ... Xiaojiang Peng
-
Zhi Hou, et. al.Zhi Hou ... Xiaojiang Peng
01 Jun 2021
01 Jun 2021

PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection
Yue Liao ... Fei Wang
-
Yue Liao, et. al.Yue Liao ... Fei Wang
01 Jun 2020
01 Jun 2020

An Optimization Model for Human-Object Interaction Detection Inspired by Multi-features
Hailan Kuang ... Jian Dong
-
Hailan Kuang, et. al.Hailan Kuang ... Jian Dong
01 Apr 2019
01 Apr 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence