Multi-stream neural network fused with local information and global information for HOI detection

Limin Xia,Rui Li

doi:10.1007/s10489-020-01794-1

Abstract

Human-Object Interaction (HOI) Detection is a new genre of human-centric visual relationship detection task, which is significant to deep understanding of visual scenes. Due to the complexity of the visual scene in the image, HOI detection is still a challenging task, the most critical part of which is feature extraction and representation. Some existing approaches rely solely on local region information for HOI detection without using global contextual information, but global contextual information contributes to this task in some HOI categories. Other approaches incorporate global contextual information for HOI detection while losing local region information. In this work, we propose a multi-stream neural network architecture composed of three special module that employs both local region information and global contextual information for HOI detection. This model can detect not only the HOI categories based on local region information but also on global contextual information. Our model more fully considers all HOI categories in the dataset. Compared with other existing approaches, the proposed model shows improved performance on V-COCO and HICO-DET benchmark datasets, especially when predicting rare HOI categories.

Full Text