Abstract

Abstract In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including bandwidths and latencies, partitioning, merging, initialization, overhead for computational kernel launch and cleanup. We show that theoretical results from our model are close to real results as differences do not exceed 5% for larger data sizes, with up to 16.7% for smaller data sizes. For an exemplary workload based on solving systems of equations of various sizes with various compute-to-communication ratios we demonstrate that using an integer linear programming solver (lp_solve) with timeouts allows to obtain significantly better total (solver+application) run times than runs without timeouts, also significantly better than arbitrary chosen ones. We show that OpenCL 1.2’s device fission allows to obtain better performance in heterogeneous CPU+GPU environments compared to the GPU-only and the default CPU+GPU configuration, where a whole device is assigned for computations leaving no resources for GPU management.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.