Abstract

Early action prediction aiming to recognize which classes the actions belong to before they are fully conveyed is a very challenging task, owing to the insufficient discrimination information caused by the domain gaps among different temporally observed domains. Most of the existing approaches focus on using fully observed temporal domains to "guide" the partially observed domains while ignoring the discrepancies between the harder low-observed temporal domains and the easier highly observed temporal domains. The recognition models tend to learn the easier samples from the highly observed temporal domains and may lead to significant performance drops on low-observed temporal domains. Therefore, in this article, we propose a novel temporally observed domain contrastive network, namely, TODO-Net, to explicitly mine the discrimination information from the hard actions samples from the low-observed temporal domains by mitigating the domain gaps among various temporally observed domains for 3-D early action prediction. More specifically, the proposed TODO-Net is able to mine the relationship between the low-observed sequences and all the highly observed sequences belonging to the same action category to boost the recognition performance of the hard samples with fewer observed frames. We also introduce a temporal domain conditioned supervised contrastive (TD-conditioned SupCon) learning scheme to empower our TODO-Net with the ability to minimize the gaps between the temporal domains within the same action categories, meanwhile pushing apart the temporal domains belonging to different action classes. We conduct extensive experiments on two public 3-D skeleton-based activity datasets, and the results show the efficacy of the proposed TODO-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call