MetaVD: A Meta Video Dataset for enhancing human action recognition datasets

Yuya Yoshikawa,Yutaro Shigeto,Akikazu Takeuchi

doi:10.1016/j.cviu.2021.103276

Yuya Yoshikawa, Yutaro Shigeto + Show 1 more

Open Access

https://doi.org/10.1016/j.cviu.2021.103276

Copy DOI

Abstract

Numerous practical datasets have been developed to recognize human actions from videos. However, many of them were constructed by collecting videos within a limited domain; thus, a model trained using one of the existing datasets often fails to classify videos in a different domain accurately. A possible solution for this drawback is to enhance the domain of each action label, i.e., to import videos associated with a given action label from the other datasets, and then, to train a model using the enhanced dataset. To realize this solution, we constructed a meta video dataset from the existing datasets for human action recognition, referred to as MetaVD. MetaVD comprises six popular human action recognition datasets, which we integrated by annotating 568,015 relation labels in total. These relation labels reflect equality, similarity, and hierarchy between action labels of the original datasets. We further present simple yet effective dataset enhancement methods using MetaVD, which are useful for training models with higher generalization performance, as established by experiments on human action classification. As a further contribution of MetaVD, we show that its analysis can provide useful insight into the datasets.

Full Text