Abstract

Heterogeneous multi-core systems that contain multiple CPUs and GPUs are gaining momentum, as they are providing different computation power to meet the performance demand of modern applications. On such systems, developers try to fully utilize the computation power both for CPU and GPU by using the emerging programming models such as CUDA and OpenCL. To achieve the maximal performance, developers must carefully offload the appropriate workload to the compute devices according to the characteristics of target architecture. Under such scenario, seamlessly data motion between different processors become crucial. Additionally, re-organizing the data layout to fit the target architectures, such as array-of-structure (AOS) for CPU, structure-of-array (SOA) for GPU, and coordinate (COO) format to ELLPACK (ELL) for sparse computation, address such concern. In this paper, we propose a hardware memory manager, which efficiently optimizes the conversion of data layouts for heterogeneous multi-core systems on-the-fly. We address coalescing and sparse format conversion issue in our design. A novel ping-pong transpose architecture is devised to reorganize non-coalescing access pattern, and a histogram unit and sparse address generator are presented to process sparse storage format transformation. Our design reduces the overhead of data transfer and layout transformation among CPU and GPU. In our experiment, our design achieves 68.5 to 2.19 times speed up comparing to software-based library depending on data size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call