Abstract

To understand the visual world, a device knows not only the instances but also how they interact. Humans are at the center of such interactions. Detection of human–object interaction (HOI) is one of the growing research fields in computer vision. However, identifying HOIs due to the large label space of verbs and their interaction with various object types still needs much research. We focus on HOIs in images, which is necessary for a deeper understanding of the scene. In addition to two-dimensional (2D) information, such as the appearance of humans and objects and their spatial location, three-dimensional (3D) status, especially in the configuration of the human body and object as well as their location and spatial, can play an important role in learning HOI. The mapping of 2D to 3D world adds depth information to the problem. These issues led us to collect 3D information along with the 2D features of the images to provide more accurate results. We show 3D attributes, such as face transformation, the viewing angle, the position of an object, and its related location to the human face, can improve HOI learning. The results of experiments on large-scale data show that our method has been able to improve the outcome of interactions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.