The object of this study is the performance of the SYCL standard tools when solving the LU matrix decomposition problem. SYCL is a fairly new technology for parallel computing in heterogeneous systems, so the topic of evaluating the performance of the standard on specific tasks in the field of parallel computing is relevant. In the study, the algorithm of parallelized LU decomposition of a square matrix was implemented by means of the SYCL standard and standard C++, and an experiment was conducted to test the implementation in a heterogeneous system with several types of processors. During testing, the program received square matrices of various dimensions as input, and the output was the execution time of the LU schedule on the selected processor. The obtained results, presented in the form of tabular and graphic data, show the advantage of the implementation of the SYCL standard over ordinary C++ by more than 2 times when using a graphics processor. It was experimentally shown that the implementation on SYCL is almost not inferior in speed to the implementation on ordinary C++ when executed on a central processor. Such results are caused both by the high possibility of parallelizing the LU schedule algorithm itself, and by the great work of the developers of the standard on its optimization. The obtained results indicate the possibility of speeding up the solution of the LU decomposition of the matrix and similar algorithms by means of SYCL when using heterogeneous systems with processors optimized for data parallelism. The results of the study can be used in justifying the choice of technology for solving LU matrix decomposition problems or problems with a similar parallelization scheme.
Read full abstract