Static Block-Recursive Cholesky Algorithm for a Distributed Memory Cluster

Gennadi Malaschonok,Andrii Ivaskevych

doi:10.18523/2617-3808.2020.3.114-120

Abstract

This paper investigates the block-recursive parallel algorithm for Cholesky decomposition for a super-computer with distributed memory. For ease of scaling, the number of cores is a degree of the number two. We investigate the resistance of the algorithm to the accumulation of computational errors and the scaling efficiency of the static block-recursive algorithm.Note that the problem does not have an exact solution in rational numbers (for example, LU-decompo-sition has an exact solution), so it is necessary to use approximate calculations and there are no other approaches than calculations with some accuracy. We have not been able to find systematic studies on the accumulation of error in the Cholesky algorithm, so we conduct such studies in this paper. elements are integers with a fixed number of binary digits and a uniform distribution.We have shown that for dense matrices, starting with matrix size 64, the use of double precision (standard floating-point arithmetic and 64-bit machine word) does not allow to obtain an error less than 1, even in simple cases. As the size of the coefficients increases, the error only increases, so the use of such arithmetic for matrices of even larger size loses its meaning. To solve this problem, we used BigDecimal arithmetic for large matrices. It allows you to programmatically specify the accuracy, which is specified as the number of decimal places. First, we determine the required number of characters for BigDecimal so that the calculation error of this matrix does not exceed one, and then we experiment with different numbers of cores for such a matrix. We give recommendations for dense matrices, the elements of which were given randomly with a uniform distribution. For such matrices, based on their size and the number of decimal places in their elements, we recommend the choice of accuracy for machine arithmetic and the number of cores for calculations.Manuscript received 09.06.2020

Highlights

У цій статті досліджено блочно-рекурсивний паралельний алгоритм розкладання Холецького для суперкомп’ютера з розподіленою пам’яттю
Зважаючи на їхній розмір і кількість десяткових знаків у їхніх елементів, ми рекомендуємо вибір точності для машинної арифметики і кількість ядер для обчислень
Слушне твердження про залежність максимальної похибки від розміру вхідних даних було підтверджено результатами тестування, на основі яких ми отримали рекомендації для користувачів

Summary

Блочно-рекурсивний алгоритм розкладання Холецького

Нехай задано позитивно-визначену симетричну матрицю А, і потрібно знайти нижню трикутну матрицю L, таку, що A = L * LT. Розіб’ємо кожну з цих матриць на чотири блоки: Малашонок Г. Я. Статичний блочно-рекурсивний алгоритм Холецького для кластера з розподіленою пам’яттю. Перемножимо L і LT та прирівняємо до А. (2) a*bT = β; bT = a-1 * β; b = (bT ) T (3) b*bT + c*cT = γ; c*cT = γ - b*bT (рекурсивний крок за с). Ми розширимо алгоритм і будемо додатково знаходити матрицю, зворотну до матриці L

Схема блочно-рекурсивного паралельного алгоритму розкладання Холецького

Головні програмні компоненти

Етапи виконання паралельного алгоритму

Накопичення похибки в алгоритмі Холецького