Abstract

Non-maximum suppression (NMS) is an essential part of the post-processing of object detectors. Most object detection models require NMS algorithms to filter overlapping candidate boxes belonging to the same object and reserve the box that best represents the object as the object representative box. However, the hardware for the standard NMS algorithm has some disadvantages, such as high complexity (especially when box numbers are significant), high latency, large area, and high power consumption. To solve these problems, we propose an efficient parallel hardware architecture, which uses a new sorting circuit with a ping-pong buffer and a new retention mechanism of the candidate box for the new NMS algorithm called distance over side-NMS (DoS-NMS). This architecture uses a PE group with a voting mechanism to simplify the algorithm for reducing the latency and area. Additionally, the PE group computes the DoS (non-intersection over union) of the candidate box and multiple object representatives in parallel, significantly reducing algorithm complexity and memory access cost. Experiments indicated that the algorithm runs on the chip with area of 0.75mm2, power consumption of 68.41 mW, and normalized area efficiency that is 3.72 and 4.84 times higher than the two state-of-the-art methods, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call