SQAB: Specific query anchor boxes for human–object interaction detection

Junkai Li,Huicheng Lai,Guxue Gao,Jun Ma,Hutuo Quan,Dongji Chen

doi:10.1016/j.displa.2023.102570

Junkai Li, Huicheng Lai + Show 4 more

https://doi.org/10.1016/j.displa.2023.102570

Copy DOI

Export

Save

Cite

Journal: Displays	Publication Date: Nov 2, 2023
Citations: 2

Affiliation: Xinjiang University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Human-Object Interaction (HOI) detection is the cornerstone of advanced visual understanding, aiming to identify relationships and interactions among different objects in images. Transformer-based previous methods commonly utilize traditional query embeddings to predict HOI, but this approach suffers from slow training convergence. Although recent research defines HOI queries as reference points, their semantic information remains ambiguous, ignoring object scale differences. To address these issues, we propose to use anchor boxes as queries for HOI detection for the first time, which can significantly accelerate the convergence speed. Furthermore, in order to enable anchor boxes to focus on HOI features efficiently, we designed an end-to-end Specific Query Anchor Boxes (SQAB) network. Our method includes a Hierarchical Detection Branch (HDB) and an Interaction Refinement Branch (IRB). Firstly, HDB uses specific query anchor boxes for prediction on multi-scale feature maps and uses relation content queries to associate contextual information. In addition, IRB utilizes multi-scale body part masks to guide the model to focus on key interaction regions effectively between humans and objects, improving the performance of interaction categories. Experimental results show that SQAB performs superior to the baseline, only requiring 25 epochs of the training cycles on the widely used HOI benchmark datasets (V-COCO, HICO-DET, and HOI-A). On the HICO-DET and HOI-A datasets, mean average precision(mAP) increased by approximately 5.99 % and 3.02%, respectively. On the V-COCO dataset, SQAB increases mAP by up to 10.57%.

Full Text