Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

Mykola Semylitko,Gennadii Malaschonok

doi:10.18523/2617-3808.2021.4.16-22

Abstract

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.

Highlights

Ця робота пропонує реалізацію паралельного алгоритму SVD для тридіагональної матриці на відеокарті з використанням архітектури Nvidia CUDA для роботи з великими матрицями
Для цього обчислюється косинус та синус для матриці повороту за такою формулою: і матриця Ґівенса матиме такий вигляд
Функції працюють у такий спосіб: 1) обчислення зміщення для елементів у тридіагональній матриці; 2) перевірка значення елемента, який буде обнулений; якщо абсолютне значення менше точності, тоді перетворення не відбувається, у такому разі переходимо до пункту 3; 3) обчислення синуса та косинуса для матриці повороту; 4) множення елементів; 5) запис нових значень у тридіагональну матрицю; 6) обчислення зміщення щодо позиції квадрата для матриці L (R); 7) множення елементів та запис нових значень до матриці L (R) у глобальній пам’яті

Summary

МАТРИЦІ НА ВІДЕОКАРТІ

Ця робота пропонує реалізацію паралельного алгоритму SVD для тридіагональної матриці на відеокарті з використанням архітектури Nvidia CUDA для роботи з великими матрицями. Для цього було досліджено роботу послідовного алгоритму, розроблено модель паралельного алгоритму на Java, який враховує особливості роботи відеокарти, і реалізовано та протестовано алгоритми для відеокарти з використанням різних типів пам’яті відеокарти, які можна використовувати у програмах на Java та С/C++. Ключові слова: сингулярний розклад матриці, SVD, Nvidia CUDA, Java, C++. Алгоритм SVD (Singular Value Decomposition – сингулярний розклад матриці) використовують у рекомендаційних системах, машинному навчанні, обробленні зображення, різних алгоритмах роботи з матрицями, які можуть бути дуже великого розміру, та Big Data. Враховуючи особливості роботи цього алгоритму, він може виконуватися на великій кількості обчислювальних потоків, які мають тільки відеокарти. Такий підхід дасть змогу зменшити час обчислень, а отже й зменшити кількість ресурсів та грошових витрат

SVD алгоритм для тридіагональної матриці

При множенні вхідної матриці на матрицю

Паралельний алгоритм

Оптимізація множення для кроків алгоритму

Завершення роботи алгоритму

Результати тестування

Кількість ітерацій

Точність обчислень

Список літератури

ON A VIDEO CARD USING THE NVIDIA CUDA ARCHITECTURE

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: NaUKMA Research Papers. Computer Science	Publication Date: Dec 10, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: NaUKMA Research Papers. Computer Science

Lead the way for us

Similar Papers

POSTER
Guray Ozen ... Jesus Labarta
-
Guray Ozen, et. al.Guray Ozen ... Jesus Labarta
11 Sep 2016
11 Sep 2016

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy
Lena Oden ... Holger Fröning
Parallel Computing | VOL. 57
Lena Oden, et. al.Lena Oden ... Holger Fröning
29 Mar 2016
Parallel Computing | VOL. 57

Improving the Performance of the CamShift Algorithm Using Dynamic Parallelism on GPU
Yun Tian ... Carol Taylor
-
Yun Tian, et. al.Yun Tian ... Carol Taylor
18 Jul 2017
18 Jul 2017

Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs
Mark Gates ... Jack Dongarra
Parallel Computing | VOL. 74
Mark Gates, et. al.Mark Gates ... Jack Dongarra
02 Nov 2017
Parallel Computing | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: NaUKMA Research Papers. Computer Science