AbstractIn conventional machine learning (ML), a fundamental assumption is that the training and test sets share identical feature distributions, a reasonable premise drawn from the same dataset. However, real-world scenarios often defy this assumption, as data may originate from diverse sources, causing disparities between training and test data distributions. This leads to a domain shift, where variations emerge between the source and target domains. This study delves into human action recognition (HAR) models within an unconstrained, real-world setting, scrutinizing the impact of input data variations related to contextual information and video encoding. The objective is to highlight the intricacies of model performance and interpretability in this context. Additionally, the study explores the domain adaptability of HAR models, specifically focusing on their potential for re-identifying individuals within uncontrolled environments. The experiments involve seven pre-trained backbone models and introduce a novel analytical approach by linking domain-related (HAR) and domain-unrelated (re-identification (re-ID)) tasks. Two key analyses addressing contextual information and encoding strategies reveal that maintaining the same encoding approach during training results in high task correlation while incorporating richer contextual information enhances performance. A notable outcome of this study is the comprehensive evaluation of a novel transformer-based architecture driven by a HAR backbone, which achieves a robust re-ID performance superior to state-of-the-art (SOTA). However, it faces challenges when other encoding schemes are applied, highlighting the role of the HAR classifier in performance variations.
Read full abstract