A Temporal Convolutional Network for Weakly Supervised Action Segmentation

Zixuan Zou,Songlin Sun,Junzhe Liu,Jiaqi Zou

doi:10.1109/ic-nidc54101.2021.9660442

Abstract

The task of video action segmentation in weakly supervised learning is one of the key points of video content understanding. The ground truth only provides a set of actions but not frame level features. A popular type uses a neural network framework to train the prediction model. Our key contribution is a new Hidden Markov Model (HMM) grounded on a Temporal Convolutional Network (TCN) to label video frames, and thus generate a pseudo-ground truth for the subsequent pseudo-supervised training. In testing, we use Viterbi algorithm to generate the time action sequence to be selected, and finally get the largest posteriori sequence. We evaluate the performance of action segmentation task on breakfast dataset. The research experiments on this dataset show that our model gets efficient performance.

Full Text