Abstract

As multicore architectures overtake single-core architectures in today's and future compute systems, traditional applications with sequential algorithms can no longer rely on technology scaling to improve performance. Instead, applications must switch to parallel algorithms to take advantage of multicore system performance. Image processing applications exhibit a high degree of parallelism and are excellent candidates for multicore systems. However, simply exploiting parallelism is not enough to achieve the best performance. Optimization must take into account underlying architecture characteristics such as wide vector and limited bandwidth. This article illustrates techniques that can be used to optimize performance for multicore x86 systems on three key image processing kernels: fast Fourier transform, convolution, and histogram.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call