In recent years, one-stage HOI (Human–Object Interaction) detection methods tend to divide the original task into multiple sub-tasks by using a multi-branch network structure. However, there is no sufficient attention to information communication between these branches. The inference approach in the cascaded structure is singular, while fully parallel methods will disrupt the associations between different pieces of information. Besides, noise interference may occur during the fusion of different features and thus affect the detection performance. To address these issues, this article proposes a one-stage three-branch parallel HOI detection method, which treats HOI as three separate sub-tasks (human detection, object detection, and interaction detection) and leverages three distinct reasoning relationships to generate richer relational information. Firstly , an auxiliary feature fusion (AFF) module is introduced, which integrates features originally extracted independently to form fused features enriched with supplementary information. This approach strengthens communication between branches in the network while handling the three sub-tasks concurrently, thereby facilitating the exchange of more contextual information. Secondly , to mitigate noise interference generated during the fusion process, a fusion noise suppression (FNS) module is introduced, which effectively suppresses noise and enhances the model’s performance in interaction detection tasks. Finally , experiments are conducted on two major benchmark datasets, and experimental results show that our HOI detection method is superior to previous methods. Also, ablation studies confirm the effectiveness of all the components in our proposed method.
Read full abstract