Human–object interaction (HOI) detection is important to understand human-centric scenes, the hard core of which lies in learning the structure information from various types of relations. Existing works tackle it by introducing spatial context, extra knowledge or graph-based propagation based on the original hard labels. However, they still face challenges in dealing with action co-occurrence and complex HOIs. In this paper, we creatively propose to recycle the ground-truth annotations to get implied information for structure representations in HOIs. The action-aware closeness labeling (ACL) task is designed to capture the contextual information based on the statistic of action co-occurrence from the data source. Furthermore, we present a H-O relation graph supervision (RGS) to get more reliable relations in complicated scenes by constraining the attention weights of the H-O relation graph. Such a direct supervision on mutual relations is ignored in existing works. Experiments for HOI detection on the V-COCO and HICO-DET datasets indicate the superiority of the proposed method even without any extra knowledge.