From detection to understanding: A survey on representation learning for human-object interaction

Tianlun Luo,Steven Guan,Rui Yang,Jeremy Smith

doi:10.1016/j.neucom.2023.126243

Abstract

Human-Object Interaction (HOI) detection is a critical topic in the visual understanding field. With the development of deep learning models, the research of HOI detection has been profoundly reshaped. Deep convolutional neural networks increased the object recognition accuracy of static images and induced a detection-based HOI detection stream. The detection-based models resolve the HOI detection problem from a classification perspective. Another stream of HOI detection methods seeks a deeper understanding of the information shown in images, and they are named HOI understanding methods in this survey paper. HOI understanding methods usually acquire external linguistic data to enable the deep models to learn more about the images. Additionally, some of the HOI understanding methods exploit graph neural networks (GNN) to increase the inference accuracy of the model.

Full Text