Abstract

Object detection is a popular image-processing technique, widely used in numerous applications for detecting and locating objects in images or videos. While being one of the fastest algorithms for object detection, Single-shot Multibox Detection (SSD) networks are also computationally very demanding, which limits their usage in real-time edge applications. Even though the SSD post-processing algorithm is not the most-complex segment of the overall SSD object-detection network, it is still computationally demanding and can become a bottleneck with respect to processing latency and power consumption, especially in edge applications with limited resources. When using hardware accelerators to accelerate backbone CNN processing, the SSD post-processing step implemented in software can become the bottleneck for high-end applications where high frame rates are required, as this paper shows. To overcome this problem, we propose Puppis, an architecture for the hardware acceleration of the SSD post-processing algorithm. As the experiments showed, our solution led to an average SSD post-processing speedup of 33.34-times when compared with a software implementation. Furthermore, the execution of the complete SSD network was on average 36.45-times faster than the software implementation when the proposed Puppis SSD hardware accelerator was used together with some existing CNN accelerators.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.