Abstract
The diffusion of vision sensor nodes in a wide range of applications has given rise to higher computational demand at the edge of the Internet of Things (IoT). Indeed, in-node video sense-making has become essential in the form of high-level tasks such as object detection for visual monitoring, mitigating data deluge from the wireless network to the cloud storage level. In such applications, deep neural networks are well known to be a prime choice, in view of their performance and flexibility. However, such properties come at the cost of high computational requirements at inference time, which directly hamper power efficiency, lifetime and cost of self-powered edge devices. In this paper, a computationally-efficient inference technique is introduced to perform the ubiquitously required task of bounding box-based object detection. The proposed method leverages the correlation among frames in the temporal dimension, uniquely requires minor memory overhead for intermediate feature map storage and architectural changes, and does not require any retraining for immediate deployment in existing vision frameworks. The proposed method achieves 18.3% (35.8%) computation reduction at 3.3% (3.2%) memory overhead, and 3.8% (6.8%) accuracy drop in YOLOv1(VGG16) SSD(VGG16) neural networks under the CAMEL dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.