Abstract
The rapid development of deep learning in recent years has led to breakthroughs in more and more problems, such as image recognition and object detection. Human-object interaction(HOI) detection is an important issue in image comprehension, which has higher actual value than image classification and object detection. but still not well solved. Different from HOI recognition, HOI detection is not only to determine the presence of HOIs, but also to estimate their locations, so this problem is defined as predicting a human and an object bounding box with an interaction class label that connects them. In this paper, we propose an approach that uses multi-features to solve HOI detection task. We use different methods to extract visual features and spatial features from images, and use their fusion to provide more effective support for HOI detection . The core of our approach is the Visual Feature Extracted Model which base on convolutional neural network with attention module and can provide more detailed visual features. We validate our approach on the recently introduced Verbs in COCO (V-COCO), and verification results indicate that our approach achieves extraordinary improvements in HOI detection.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have