Abstract
Recent advances in big data systems and databases have made it possible to gather raw unlabeled data at unprecedented rates. However, labeling such data constitutes a costly and timely process. This is especially true for video data, and in particular for human activity recognition (HAR) tasks. For this reason, methods for reducing the need of labeled data for HAR applications have drawn significant attention from the research community. In particular, two popular approaches developed to address the above issue are data augmentation and domain adaptation. The former attempts to leverage problem-specific, hand-crafted data synthesizers to augment the training dataset with artificial labeled data instances. The latter attempts to extract knowledge from distinct but related supervised learning tasks for which labeled data is more abundant than the problem at hand. Both methods have been extensively studied and used successfully on various tasks, but a comprehensive comparison of the two has not been carried out in the context of video data HAR. In this work, we fill this gap by providing ample experimental results comparing data augmentation and domain adaptation techniques on a cross-viewpoint, human activity recognition task from pose information.
Highlights
One of the most common and serious problems when trying to train a supervised learning model is the lack of a sufficient amount of labeled data
Two classes of methods have been widely adopted to deal with the aforementioned issue, and these are compared in this work within the area of human activity recognition, namely: data augmentation and domain adaptation
The rest of this paper is organized as follows: in Section 2, we present related work, the adopted generic human activity recognition (HAR) approach and the proposed approach which consists of two distinct variations, that is, classification upon viewpoint data augmentation and semi-supervised domain adaptation
Summary
One of the most common and serious problems when trying to train a supervised learning model is the lack of a sufficient amount of labeled data. For several tasks of practical and research interest within the broader area of computer vision, for example, image/video classification, the collection of an adequate number of labelled data is either infeasible or too costly. As demonstrated in Reference [1], for several tasks, the performance of models may only increase logarithmically with increasing volume of available training data. For these reasons, much research has been recently devoted to the construction of methods that are robust against insufficient labeled data. Two classes of methods have been widely adopted to deal with the aforementioned issue, and these are compared in this work within the area of human activity recognition, namely: data augmentation and domain adaptation.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have