Abstract

CPU-GPU Processing Element (PE) has become a very popular architecture to construct modern multiprocessing system because of its high performance on massively parallel processing and vector computations. Power dissipation is one of the important factors influencing design development of High Performance Computing (HPC) as a large scale scientific computation may use thousands of processors and hundreds hours of continuous execution that will result enormous energy predicament. Enhancing the utilizations of an individual PE to reach its best computation capability and power efficiency is valuable for saving the overall power cost of large multi-processing systems. Power performance of a CUDA PE is dependent on electrical features of the inside hardware components and their interconnections; also high level applications and the parallel algorithms performed on it. Based on measurements and experimental evaluations, in this work we provide a load sharing method to adjust the workload assignment within the CPU and GPU components inside a CUDA PE in order to optimize the overall power efficiency. The improvement on computation time and power consumption has been validated by examining the program executions when above method is applied on real systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call