Abstract
As a typical machine-learning based detection technique, deformable part models (DPM) achieve great success in detecting complex object categories. The heavy computational burden of DPM, however, severely restricts their utilization in many real world applications. In this work, we accelerate DPM via parallelization and hypothesis pruning. Firstly, we implement the original DPM approach on a GPU platform and parallelize it, making it 136 times faster than DPM release 5 without loss of detection accuracy. Furthermore, we use a mixture root template as a prefilter for hypothesis pruning, and achieve more than 200 times speedup over DPM release 5, apparently the fastest implementation of DPM yet. The performance of our method has been validated on the Pascal VOC 2007 and INRIA pedestrian datasets, and compared to other state-of-the-art techniques.
Highlights
Detecting objects in visual media is important for many computer vision tasks
We evaluated a range of deformable part models (DPM) acceleration methods on the Pascal VOC 2007 dataset: (1) DPM release 5 [24], (2) cascade DPM [13], (3) branch-bound DPM [19], (4) FFT DPM [17], (5) coarse-to-fine DPM [16], (6) fastest DPM [20], and our approach
We report the average precision (AP) of all 20 object categories in Table 2, where DPM-GPU-P is the proposed method in this work and DPM-C++-omp is the multicore C++ version of DPM release 5
Summary
Detecting objects in visual media is important for many computer vision tasks. To understand an image, object detection usually is the first step. After the templates (filters) have been trained, at detection time, the original DPM evaluates the appearance score densely at every image position and scale by calculating the correlation between filters and feature maps. These two factors result in a heavy computational burden for DPM, but potentially these can be largely relieved. The performance of our GPU version of DPM, which we call DPM-GPU, significantly outperforms other accelerated DPM approaches, achieving over 136 times speedup compared to the original DPM release 5, without accuracy loss This is apparently the most complete attempt to parallelize DPM release 5 on the GPU and evaluate it on challenging public datasets. As far as we know, this is the fastest implementation of DPM yet
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have