Abstract

We present a Multi-actor Activity Detection Framework (MADF) to model the interactive relationship among multiple actors for activity detection in extended videos. MADF can detect 3 groups of multi-actor activities with different kinds of actors, which involves three stages: detection, classification and post-processing. In the detection stage, both interaction proposals and actor proposals are generated in each video clip, in order to eliminate irrelevant background in the scene. In the classification stage, 3 different classification networks are proposed to classify the 3 groups of activities. And further, for person–object interaction, an attention mechanism is adopted to help the person–object classification network to pay more attention to the small-scale objects; for person–person interaction, a suppression module is used to improve the accuracy of the person–person activity detection; for person–vehicle interaction, a spatial–temporal graph convolution network (GCN) module is embedded to model the fine-grained relationship between the person and vehicle in the person–vehicle classification network, with a proposed Mutually Exclusive Category Loss (MECLoss) helping this network distinguish mutually exclusive activities. At last, we use the off-the-shelf post-processing methods to re-score the proposals for more stable results. The proposed system achieves a great progress on our baseline and achieves the state-of-the-art results in TRECVID 2021 ActEV challenge.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call