Abstract

In the Internet of Things era, where we see many interconnected and heterogeneous mobile and fixed smart devices, distributing the intelligence from the cloud to the edge has become a necessity. Due to limited computational and communication capabilities, low memory and limited energy budget, bringing artificial intelligence algorithms to peripheral devices, such as end-nodes of a sensor network, is a challenging task and requires the design of innovative solutions. In this work, we present PhiNets , a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory, thus exploiting all available resources for a given platform. With a YoloV2 detection head and Simple Online and Realtime Tracking (SORT), the proposed architecture achieves state-of-the-art results in (i) detection on the COCO and VOC2012 benchmarks, and (ii) tracking on the MOT15 benchmark. PhiNets obtain a reduction in parameter count of around 90% with respect to previous state-of-the-art models (EfficientNetv1, MobileNetv2) and achieve better performance with lower computational cost. Moreover, we demonstrate our approach on a prototype node based on an STM32H743 microcontroller (MCU) with 2 MB of internal Flash and 1MB of RAM and achieve power requirements in the order of 10 mW. The code for the PhiNets is publicly available on GitHub. 1

Highlights

  • Over the past decade, we have witnessed two parallel trends

  • We prove the eiciency of PhiNets by comparing them with existing lightweight backbones within a YOLOv2 [30] detection head and Simple Online Real-time Tracking (SORT) tracker [1]

  • 4.2 Detection To evaluate object detection performance towards tiny multi-object tracking, we trained EicientNets, MobileNets and PhiNets sized between 1M and 10M Multiply-Accumulate operations (MAC) on a subset of the MS COCO [25] and VOC2012 [9] object detection benchmarks

Read more

Summary

INTRODUCTION

We have witnessed two parallel trends. On one side, the increasing popularity of the internet of things, i.e., intelligent networked things everywhere, is a consequence of the growing capabilities of the embedded systems, enhanced with capable processing units working at always increasing frequencies and ofering attractive low-power modes [11, 12, 27]. Current best-performing pipelines for multi-object detection and tracking imply using many computational resources, substantially limiting the application scenarios in which such techniques can be exploited. The one-stage detectors are the most eicient from the computational complexity perspective; they are the go-to solution for lightweight, real-time object detection. Our work contributes to the state-of-the-art by proposing a novel scalable backbone, PhiNets, for detection and multi-object tracking on resource-constrained platforms. We will argue why our convolutional block is computationally cheaper with respect to current state-of-the-art solutions and how, given the architecture of PhiNets, we can do a one-shot search of the optimal parameters for every computational constraint set (i.e., every target platform). Embedded vision processing by proposing a new architecture family, PhiNets, which pushes forward the state-of-the-art in object detection on tiny devices;. Low-power image processing, since our pipeline requires only 1.3mJ per frame or 13mW at 10 fps;

Scalable backbones
Detection methods
Tracking methods
Vision-based MCU applications
PHINETS ARCHITECTURE
Detection and tracking
Multi-Object Tracking
Power consumption
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call