Cross-media video event mining based on attention graph structure learning

Chengde Zhang,Yu Lei,Xia Xiao,Xinzhong Chen

doi:10.1016/j.neucom.2022.06.028

Abstract

Cross-media association mining based on heterogeneous information network(HIN) has received widespread attention. However, video is described by only a few words, leading to the lack of association between visual and textual information. As a result, the heterogeneous graph is inevitably incomplete, which brings great challenges to event mining. Fortunately, topological relationships can infer correlations between similar nodes. In view of this, a novel framework of web video event mining based on attention graph structure learning is proposed to generate a new adjacency matrix, which reconstructs the association among nodes. First, a novel heterogeneous network is constructed, while each relation subgraph is produced separately. Then, in each relational subgraph, feature graphs are generated by feature similarity, which can capture potential relationships between nodes. Simultaneously, semantic graph is also created by learning semantic structures to describe complex heterogeneous interactions between node semantics. Next, these graphs are fused by channel attention to reconstruct the correlation among nodes. Finally, graph convolutional network(GCN) is applied for web video event mining. Experiments on web videos from YouTube demonstrate that our proposed method is more effective than the state-of-the-art methods with significant improvement.

Full Text