Abstract

Many applications of action recognition, especially broad domains like surveillance or anomaly-detection, favor unsupervised methods considering that exhaustive labeling of actions is not possible. However, very limited work has happened in this domain. Moreover, the existing self-supervised approaches suffer from their dependency upon labeled data for finetuning. To this end, this paper puts forward a manifold based unsupervised pose-sequence recognition approach that leverages only the natural biases present in the data. It works by clustering the projections of temporal derivatives of the fragmented data on the Grassmann manifold. Temporal derivatives are formed by the inter-frame gradients with local and global metrics. To commensurate with this, a dynamic view-invariant pose representation is proposed. Additionally, a variable aggregation step is introduced for better feature vector quantization. Extensive empirical evaluation and ablations on several challenging datasets under three categories confirm the superiority of the proposed approach in contrast to current methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call