In this paper, we propose a technique for hierarchical yoga pose classification (YPC) in a multi-stage multi-tasking framework. We propose a three-stage transfer learning based end-to-end training methodology. Novelty lies in (a) proposed supervised contrastive combined loss function for stage-1 training, (b) proposed Encoder–Decoder network architecture with attention mechanism for stage-3 training, (c) proposed spatial context aware multi-tasking combined loss function for stage-3 training. Firstly, for stage-1 training, we propose the usage of linear combination of three loss functions: cross-entropy, self-supervised contrastive loss and supervised contrastive loss in a multi-tasking manner. We introduce radial and cosine margin into the formulation of self-supervised and supervised contrastive loss to pull feature embeddings of same class closer together compared to feature embeddings of different classes. Weights learned over stage-1 training are subsequently fine-tuned over cross-entropy multi-tasking loss in stage-2. These stage-2 weights are transfer learned and are further fine-tuned in stage-3 training. For stage-3 training, we propose the usage of spatial context aware multi-tasking combined loss function. This loss function leverages on the fine-grained spatial features obtained from HiResCAM. These are processed in parallel with features obtained from XGrad-CAM (sensitivity and conservation axioms satisfying features) to further supervise the learning of hierarchical yoga pose classifier. We exemplify our methodology on the publicly available Yoga-82 large-scale dataset. We report peak Top-1 YPC accuracy of 95.89% over 6 pose classes (Yoga-6), 93.85% over 20 pose classes (Yoga-20) and 90.0% over 82 pose classes (Yoga-82). Our proposed method achieves 6.1% improvement over Top-1 classification accuracy in Yoga-6 hierarchy, 9.3% improvement in Yoga-20 hierarchy and 10.9% improvement in Yoga-82 hierarchy in comparison with state-of-the-art (SOTA) methodology. We achieve the current best Top-1 classification accuracies in all the three YPC hierarchies.
Read full abstract