Abstract

OpenMP is one of the mainstream parallel programming models in recent years. After version 4.0, OpenMP introduced a new target instruction to increase the functionality of heterogeneous programming, called OpenMP Offload. For the domestic heterogeneous platform DCU, the thread scheduling algorithm under OpenMP parallel computing has low performance in the default mode, which does not take the best advantage of GPU parallel computing and has wasted resources. To address this problem, this paper performs algorithm improvement at the compiler level, analyzes the available resources of the system by combining the DCU hardware facilities, then further parses the program based on its array information to get its program iteration number, reallocates the number of threads for different execution modes in OpenMP, and optimizes the thread group increase factor by combining the DCU hardware information to adjust the thread. This paper uses the SPEC ACCEL test set to optimize the number of threads in the DCU. In this paper, we use the SPEC ACCEL test set and Polybench standard test set to test the redistribution of threads and thread groups in two parallel modes using the thread scheduling optimization algorithm. The average speedup ratio of ACCEL was improved by 40%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call