Abstract

Modern workstations are equipped with fast cache memory to enable the CPU to access the relatively slow main memory without noticeable delay. However, two typical cache characteristics (limited associativeness and power of two based memory address mapping on cache lines) cause the complete class of separable image processing algorithms to give the worst possible performance regarding data cache utilization on large images. We present three methods based on transposing the image to improve the data cache usage for both write-through and write-back caches. Experiments with a 3 × 3 uniform filter and the fast Fourier transform performed on a range of Sun workstations show that the proposed methods improve the performance considerably.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call