Abstract

Objective. The paper defines the relevance of the task of increasing the efficiency of software, which in this case is understood as reducing the operating time of the designed software in the process of solving computationally complex problems. Method. As an example of such a task, the implementation of the singular value decomposition by the Jacobi method is used. This task finds its application in various fields from signal and image processing to artificial intelligence systems. Parallel computing systems equipped with GPU are chosen as the target computing architecture. The paper discusses methods for improving the efficiency of software for target computing architectures using CUDA. Result. The existing analytical models for evaluating the effectiveness of computer programs are described. The influence of various optimizations, such as optimization of data transfers, use of the unified memory system, the number of threads, memory access patterns, and a number of others on the efficiency of the resulting software is considered. The process of optimizing the SVD implementation program is described, the results of computational experiments are presented. Conclusion. As the number of threads increases, performance may increase more than the number of threads. Impact of memory access pattern: When the memory access sequence is optimal, performance improves noticeably. Adjusting the share of memory used for L1 cache and shared memory does not have a significant impact on performance

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call