Abstract

This paper presents an accelerating face detection algorithm using Coarse Grained Reconfigurable Architectures (CGRA). Face detection algorithms usually use several stages of cascaded face detectors and require large amount of feature data, while general processors have small internal memory. Since the latency from external memory is much longer than internal memory, efficient use of memory is important to make face detection faster. In this paper, we do the first-stage of cascaded face detection process for every line using the feature data in the internal memory, and then do next stages for the candidates that pass the first-stage. In addition, by efficient use of software pipelining and vectorization, face detection process can be accelerated. The proposed method was implemented into CGRA@400MHz, and can run at 15.1@800×600 frame per second.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call