Abstract

Human activity recognition (HAR) using wearable sensors is always a research hotspot in ubiquitous computing scenario, in which feature learning has played a crucial role. Recent years have witnessed outstanding success of contrastive learning in image data, which learns invariant representations by adding contrastive loss to the last layer of deep neural networks. However, the advantages of contrastive loss have been rarely leveraged in time series data for activity recognition. A fundamental obstacle to contrastive learning in HAR is that image-based augmentation could not fit well with sensor data, which raises a critical issue: the distortions induced by augmentation might be further enlarged by intermediate layers of a network and thus severely harm semantic structure of original activity instance. In this paper, taking an inspiration from deeply-supervised learning, we propose a novel approach called Contrastive Supervision by considering “where” to contrast, which aims to learn time series augmentation invariances by forcing positive pairs nearby and negative pairs far apart at different depths of neural network. Our approach can be seen as a generalization of contrastive learning in a deeply-supervised setting, where the contrastive loss is used to supervise the intermediate layers instead of only the last layer, allowing us to effectively leverage label information so as to better fuse the multi-level features. Experiments on popular benchmarks demonstrate that our approach can learn better representations and improve classification accuracy without additional inference cost for various HAR tasks in supervised and semi-supervised learning paradigms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call