Abstract

Heterogeneous platforms that are consisted of CPU and add-on streaming processors are widely used in modern computer systems. These add-on processors provide substantially more computation capability and memory bandwidth than conventional multi-cores platforms. General-purpose computations can also be leveraged onto these add-on processors. In order to utilize their potential performance, programming these streaming processors is challenging because of their diverse underlying architectural characteristics. Several optimization techniques are applied on OpenCL-compatible heterogeneous platforms to achieve thread-level, data-level, and instruction-level parallelism. The architectural implications of these techniques and optimization principles are discussed. Finally, a case study of MRI-Q benchmark will be addressed to illustrate to capabilities of these optimization techniques. The experimental results reveal the speedup from non-optimized to optimized kernel can vary from 8 to 63 on different target platforms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.