Abstract

We present AnchorCapsule, a post-processor for object detection to relieve the workload of CPUs with minimal cost in latency. The capsule takes advantage of a ‘datastream-serving’ architecture paradigm, where computing logic is built to cater for the throughput of DRAM datastreams and simultaneously minimize the usage of on-chip buffers. The implementation results show that AnchorCapsule has a tiny area of 0.038mm2 but achieves 1 order of magnitude faster than Intel Xeon and 2 orders of magnitude faster than ARM A53, resulting in an end-to-end system-level latency reduction of 46.7% and 16.7% for Yolo-v3-tiny and Yolo-v3 networks, respectively. The precision of AnchorCapsule has been validated on 1,500 samples from three prestige datasets, giving a promising result of 98%+ accuracy in the bounding box (BBox) coordinates, 99%+ in BBox sizes and 100% in object types. Compared with state-of-the-art, AnchorCapsule can filter 9,408 candidate BBoxes in a single run, which is 3.5× in the processing capacity of best-known published work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call