ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT

Y A Zatolokin,V S Titov,E I Vatutin

doi:10.21869/2223-1560-2017-21-5-06-15

Y A Zatolokin, V S Titov + Show 1 more

Open Access

https://doi.org/10.21869/2223-1560-2017-21-5-06-15

Copy DOI

Abstract

In the article was given statement of a problem of matrix multiplication. Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance. Optimizing data distribution in global and local memory The GPU allows you to reuse the calculation time and increase real performance. To compare the performance of the developed software implementations for OpenGL and CUDA technologies, identical calculations on identical GPUs were performed, which showed higher real performance when using CUDA cores. Specific values of generation performance measured for multi-threaded software implementation on GPU are given for all of described optimizations. It is shown that the most effective approach is based on the method we can get much more performance by technique of caching sub-blocks of the matrices (tiles) in the GPU's on-chip local memory, that with specialized software implementation is provide the performance of 275,3 GFLOP/s for GPU GeForce GTX 960M.

Highlights

Задача нахождения произведения плотных матриц встречается в ряде научно-технических направлений
Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance
2. Vatutin Je.I., Martynov I.A., Titov V.S. Ocenka real'noj proizvoditel'nosti sovremennyh videokart s podderzhkoj tehnologii CUDA v zadache umnozhenija matric

Summary

ТЕХНИЧЕСКИЕ НАУКИ

Результаты измерения реальной достигнутой производительности на GPU NVidia GeForce GTX 960M показали величину 275,3 GFLOP/s, что приблизительно на 10–20% меньше аналогичных результатов, получаемых при аналогичных условиях вычислительного эксперимента для той же GPU с использованием инструментария CUDA. Алгоритмическая оптимизация программной реализации алгоритмов умножения плотных вещественных матриц на графических процессорах с поддержкой технологии OpenGL // Известия Юго-Западного государственного университета. Выполнение этого OpenGL ядра запускается с размерностью work-group, равной 32 для тестовых GPU, в качестве которых выбраны GeForce GTX 960M и. Результаты сопоставления производительности обработки на CPU и GPU для реализации без оптимизации, CPU Intel Core i7-4750HQ + GPU GeForce GTX 960M. Для OpenGL платформы кеширование j-го столбца матрицы В будем производить в быстрой локальной памяти work-group. Для оптимизации обращения к локальной памяти также был реализован алгоритм умножения с кешированием iой строки матрицы A, соответствующее ядро которого приведено ниже.

Результаты использования данного

Список литературы

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the Southwest State University	Publication Date: Oct 28, 2017
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Southwest State University

Lead the way for us

Similar Papers

Local-Global Memory Neural Network for Medication Prediction.
Jun Song ... Yin Zhang
IEEE transactions on neural networks and learning systems | VOL. 32
Jun Song, et. al.Jun Song ... Yin Zhang
01 Apr 2021
IEEE transactions on neural networks and learning systems | VOL. 32

A memory-based colonization scheme for particle swarm optimization
Adnan Acan ... Ahmet Unveren
-
Adnan Acan, et. al.Adnan Acan ... Ahmet Unveren
01 May 2009
01 May 2009

A dynamic memory management unit for embedded real-time system-on-a-chip
Mohamed Shalan ... Vincent J Mooney
-
Mohamed Shalan, et. al.Mohamed Shalan ... Vincent J Mooney
01 Jan 1999
01 Jan 1999

Efficient Code Assignment Techniques for Local Memory on Software Managed Multicores
Jing Lu ... Aviral Shrivastava
ACM Transactions on Embedded Computing Systems | VOL. 14
Jing Lu, et. al.Jing Lu ... Aviral Shrivastava
08 Dec 2015
ACM Transactions on Embedded Computing Systems | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Southwest State University