People hope that computers can be in constant intelligence development. Just like humans, they can ”see” the world and ”recognize” a visual event. We propose an approach based on computer vision methods to recognize Human-Object interaction(HOI). The technique stands on aggregating significant contextual features Human-Object interactions and scene recognition. We design a branch architecture consisting of the main branch for HOI detection and a supplementary branch for scene recognition. We explore the deep learning models through the knowledge distillation method and the Cross Branch Integration mechanism for encoding models into graph neural network architecture. We construct a knowledge graph to merge between high-level context information. When trained collaboratively, those models allow computing efficiency, strong context knowledge.