Abstract

Heterogeneous platforms composed of multi-core CPUs and different types of accelerators, like GPUs and Xeon Phi, are becoming popular for data parallel applications. The heterogeneity of the hardware mix and the diversity of the applications pose significant challenges to exploiting such platforms. In this situation, an effective workload partitioning between processing units is critically important for improving application performance. This partitioning is a function of the hardware capabilities as well as the application and the dataset to be used. In this work, we present a systematic approach to solve the partitioning problem. Specifically, we use modeling, profiling, and prediction techniques to quickly and correctly predict the optimal workload partitioning and the right hardware configuration to use. Our approach effectively characterizes the platform heterogeneity, efficiently determines the accurate partitioning, and easily adapts to new platforms, different application types, and different datasets. Experimental evaluation on 13 applications shows that our approach delivers excellent performance improvement of 1.2 $\times$ –14.6 $\times$ over a single-processor execution, and accurate partitioning with in most cases below 10 percent performance gap versus an oracle-based partitioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.