Abstract
Matrix multiplication is a fundamental concept in numerical linear algebra, and many scientific calculational problems often boil down to matrix multiplication operations. Achieving efficient and accurate matrix multiplication can help accelerate the solution of other scientific problems. The application of CUDA can fully utilize the co-processing power of CPU and GPU to significantly improve the calculational speed when processing large-scale data. In this paper, we analyzed the memory access mechanism of CUDA, optimized matrix multiplication in terms of global memory and shared memory, and analyzed the optimization results by using interval estimation, hypothesis validation, and factor analysis. The results show that: the mean value of the time spent for performing a multiplication operation on 1024 x 1024 matrices is between 0.121 and 0.496 when the confidence level is 95%; In terms of the nine multiplications for matrices with size of 1024 x 1024, the calculation time is all at the normal level during the normal distribution; the time obtained by each function for each matrix multiplication is significantly different when the significance level is 0.05.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.