Abstract

This paper presents a performance comparison between CUDA and OpenACC. The performance analysis focuses on programming models and underlying compilers. In addition, we proposed a Performance Ratio of Data Sensitivity (PRoDS) metric to objectively compare traditional subjective performances: how sensitive OpenACC and CUDA implementations are to change in data size. The results show that in terms of kernel running time, the OpenACC performance is lower than the CUDA performance because PGI compiler needs to translate OpenACC kernels into object code while CUDA codes can be directly run. Besides, OpenACC programs are more sensitive to data changes than the equivalent CUDA programs with optimizations, but CUDA is more sensitive to data changes than OpenACC if there are no optimizations. Overall we found that OpenACC is a reliable programming model and a good alternative to CUDA for accelerator devices.

Highlights

  • High-performance computing has been applied into more and more scientific research applications such as chemical reaction process, information retrieval system [1], explosion process and fluid dynamic process

  • The remainder of this paper is organized as follows: Section 2 discusses related work on performance comparison of diverse parallel programming models; Section 3 shows our configuration, testbeds and the selected benchmarks; Section 4 presents a methodology used in this paper; Section 5 discribes an overall performance analysis result, identifies and explains the main differences between CUDA and OpenACC; Section 6 presents the background of CUDA and OpenACC; Section 7 concludes this paper and presents our future work

  • The results from the paper [2] showed that in general OpenACC performance is slower than CUDA’s, we found that all results were based on only two micro benchmarks and one application

Read more

Summary

Introduction

High-performance computing has been applied into more and more scientific research applications such as chemical reaction process, information retrieval system [1], explosion process and fluid dynamic process. With OpenACC motivation of simplifying CUDA programming with similar functions, in this paper we investigated the performance comparison of CUDA and OpenACC with the following two factors: (1) CUDA is one of the most popular parallel programming models and OpenACC is an learned and simplified highlevel parallel language especially for programming beginners; and (2) One motivation of OpenACC development aims to simplify low-level parallel language such as CUDA. The remainder of this paper is organized as follows: Section 2 discusses related work on performance comparison of diverse parallel programming models; Section 3 shows our configuration, testbeds and the selected benchmarks; Section 4 presents a methodology used in this paper; Section 5 discribes an overall performance analysis result, identifies and explains the main differences between CUDA and OpenACC; Section 6 presents the background of CUDA and OpenACC; Section 7 concludes this paper and presents our future work

Related work
Testbeds
Benchmarks selection
Data sensitivity
Comparision on data sensitivity
Comparision on programming model
Background
Findings
Conclusions and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.