Abstract

Recent years have witnessed great success of deep convolutional networks in sensor-based human activity recognition (HAR), yet their practical deployment remains a challenge due to the varying computational budgets required to obtain a reliable prediction. This article focuses on adaptive inference from a novel perspective of signal frequency, which is motivated by an intuition that low-frequency features are enough for recognizing "easy" activity samples, while only "hard" activity samples need temporally detailed information. We propose an adaptive resolution network by combining a simple subsampling strategy with conditional early-exit. Specifically, it is comprised of multiple subnetworks with different resolutions, where "easy" activity samples are first classified by lightweight subnetwork using the lowest sampling rate, while the subsequent subnetworks in higher resolution would be sequentially applied once the former one fails to reach a confidence threshold. Such dynamical decision process could adaptively select a proper sampling rate for each activity sample conditioned on an input if the budget varies, which will be terminated until enough confidence is obtained, hence avoiding excessive computations. Comprehensive experiments on four diverse HAR benchmark datasets demonstrate the effectiveness of our method in terms of accuracy-cost tradeoff. We benchmark the average latency on a real hardware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call