Abstract

Because the scale, horizontal and vertical coordinates of an object in an image are arbitrary, so object detection can be viewed as a process of searching the object in the 3D space spanned by the scale, horizontal, and vertical factors. Traditional sliding window based method has to exhaustively search and check the 3D space, resulting in prohibitive computation cost. To deal with this problem, in this paper, we propose to explore both the scaling capacity and translation capacity of object detector to accelerate detection speed, without loss of detection accuracy. In our paradigm, scaling capacity can relieve the use of all possible sizes of templates at the first stage, i.e., only a few number of templates that can cover a large range of target object size are used to coarsely find the targets. Similarly, translation capacity can avoid dense grid sampling at the very beginning. After initial estimation, further evaluations with templates of finer scales are carried out around the candidates to verify the existence of target objects. Moreover, different from traditional uniform grid scanning, we present an interlaced scanning method called diamond grid scanning which can reduce redundant evaluation. Experimental results on face detection demonstrate the advantage of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call