Abstract

OpenMP supports the use of target offloading compile guidance instructions to invoke heterogeneous-platform accelerators to compute core code segments; however, unreasonable use of target offloading instructions can make the data transfer process time-consuming. The problem of unused array transfer and unused data segment transfer arises when the amount of data transferred from the host side to the device side exceeds the amount of data required for the core computation on the device side. For the transmission of unused arrays, the use of the transmitted arrays is guided by adding a filter to eliminate the transmission of redundant data; for the transmission of unused data segments, the use of arrays is quickly determined on the basis of the filter, and valid data are transmitted by optimizing Clang’s code generation strategy after obtaining the lengths of the data segments in core computation. Experiments are performed using the Polybench benchmark; the optimized speedup for unused array transfer reaches 7%, and the optimized speedup for unused data segment transfer reaches 10%. The experimental results show that data transfer optimization for target offloading characteristics can help improve program performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call