Abstract

Block matching algorithms (BMAs) like sum of absolute difference (SAD) and normalised cross correlation (NCC) form the basic building block in many computer vision applications. Full search (FS) BMAs take into account every pixel in a block, and are therefore very effective for wide motion search using different blocks. However, FS is computationally very expensive and unsuitable for real-time applications. We present efficient strategies for extracting data parallelism in FS SAD and NCC algorithms with fine-grained optimisation techniques for fully exploiting the computational capacity of the NVIDIA's multicore graphic processing units (GPUs). We demonstrate that proposed parallel implementation achieves a speedup of around 34× for SAD and 59× for NCC on GPU GTX 280, compared to sequential implementation on dual socket quad core Xeon processor with 2.50 GHz and 16 GB DRAM. Since GPUs are quite cheap and popular, our algorithm can form the base for many real-time video systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call