CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification

Junyu Gao,Changsheng Xu

doi:10.1109/tmm.2020.2969787

Abstract

With the ever-growing video categories, Zero-Shot Learning (ZSL) in video classification has drawn considerable attention in recent years. To transfer the learned knowledge from seen categories to unseen categories, most existing methods resort to an implicit model that learns a projection between visual features and semantic category-representations. However, such methods ignore the explicit relationships among video instances and categories, which impede the direct information propagation in a Category-Instance graph (CI-graph) consisting of both instances and categories. In fact, exploring the structure of the CI-graph can capture the invariances of the ZSL task with good generality for unseen instances. Inspired by these observations, we propose an end-to-end framework to directly and collectively model the relationships between category-instance, category-category, and instance-instance in the CI-graph. Specifically, to construct node features of this graph, we adopt object semantics as a bridge to generate unified representations for both videos and categories. Motivated by the favorable performance of Graph Neural Networks (GNNs), we design a Category-Instance GNN (CI-GNN) to adaptively model the structure of the CI-graph and propagate information among categories and videos. With the task-driven message passing process, the learned model is able to transfer label information from categories towards unseen videos. Extensive experiments on four video datasets demonstrate the favorable performance of the proposed framework.

Full Text