Abstract

This paper proposes a GPU-based Near-data-processing (NDP) architecture as well as a well-matched programming model considering both the characteristics of image applications and NDP constraints. First, data allocation to the processing unit is handled to keep the data locality considering the memory access pattern. Second, this predictable allocation enables to design a compact but efficient NDP architecture. By applying a prefetcher that leverages the pattern aware data allocation, the number of active warps and on-chip SRAM size of NDP are significantly reduced. This allows to satisfy the NDP constraints and increases the opportunity to integrate more processing units on a memory logic die. The evaluation results for various image processing benchmarks show that the proposed NDP GPU improves the performance compared to the baseline GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call