Abstract

The diffusion of vision sensor nodes in a wide range of applications has given rise to higher computational demand at the edge of the Internet of Things (IoT). Indeed, in-node video sense-making has become essential in the form of high-level tasks such as object detection for visual monitoring, mitigating data deluge from the wireless network to the cloud storage level. In such applications, deep neural networks are well known to be a prime choice, in view of their performance and flexibility. However, such properties come at the cost of high computational requirements at inference time, which directly hamper power efficiency, lifetime and cost of self-powered edge devices. In this paper, a computationally-efficient inference technique is introduced to perform the ubiquitously required task of bounding box-based object detection. The proposed method leverages the correlation among frames in the temporal dimension, uniquely requires minor memory overhead for intermediate feature map storage and architectural changes, and does not require any retraining for immediate deployment in existing vision frameworks. The proposed method achieves 18.3% (35.8%) computation reduction at 3.3% (3.2%) memory overhead, and 3.8% (6.8%) accuracy drop in YOLOv1(VGG16) SSD(VGG16) neural networks under the CAMEL dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call