CTDA: Contrastive Temporal Domain Adaptation for Action Segmentation

Hongfeng Han,Zhiwu Lu,Ji-Rong Wen

doi:10.1007/978-3-031-27818-1_46

Abstract

In video action segmentation scenarios, intelligent models require sufficient training data. However, the significant expense of human annotation for action segmentation makes this method prohibitively expensive, and only very limited training videos can be accessible. Further, large Spatio-temporal variations exist in training and test data. Therefore, it is critical to have effective representations with few training videos and efficiently utilize unlabeled test videos. To this end, we firstly present a brand new Contrastive Temporal Domain Adaptation (CTDA) framework for action segmentation. Specifically, in the self-supervised learning module, two auxiliary tasks have been defined for binary and sequential domain prediction. They are then addressed by the combination of domain adaptation and contrastive learning. Further, a multi-stage architecture is devised to acquire the comprehensive results of action segmentation. Thorough experimental evaluation shows that the CTDA framework achieved the highest action segmentation performance.

Full Text