Abstract

Despite the increasing investment in integrated GPUs and next-generation interconnect research, discrete GPUs connected by PCI Express still account for the dominant position of the market, the management of data communication between CPU and GPU continues to evolve. Initially, the programmer controls the data transfer between CPU and GPU explicitly. To simplify programming and enable system-wide atomic memory operations, GPU vendors have developed a programming model that provides a single virtual address space. The page migration engine in this model migrates pages between CPU and GPU on demand automatically. To meet the needs of high-performance workloads, the page size tends to be larger. Limited by low bandwidth and high latency interconnects, larger page migration has longer delay, which may reduce the overlap of computation and transmission and cause serious performance decline. In this paper, we propose partial-page migration that only migrates the requested part of a page to shorten the migration latency and avoid the performance degradation of the whole-page migration when the page becomes larger. Experiments show that partial-page migration is possible to significantly hide the performance overheads of whole-page migration when the page size is 2MB and the PCI Express bandwidth is 16GB/sec, converting an average 72.72× slowdown to a 1.29× speedup when compared with programmers controlled data transmission. Additionally, we examine the impact of page size on TLB miss rate and the performance impact of migration unit size on execution time, enabling designers to make informed decisions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call