Solving Linear Systems of Equations with Block Skyscraper Structure

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This paper proposes an efficient parallel algorithm for solving systems of linear algebraic equations (SLAE) using block skyscraper matrices, designed for execution on shared-memory computing systems. The algorithm is focused on achieving high performance and scalability, making it suitable for resource-intensive computational tasks. Special attention is given to optimizing computations and distributing the workload among processor cores to maximize acceleration. The study includes testing the proposed algorithm on nodes of a modern computational cluster by solving practical problems related to modeling the strength of building structures. The performance characteristics of the algorithm, including total execution time and acceleration coefficients depending on the number of processor cores used, have been analyzed. The impact of block sizes used in the calculations on computational performance has also been investigated. The experimental results demonstrate that the algorithm significantly reduces execution time as the number of processors increases and exhibits stable scalability on systems with a large number of cores. This highlights its applicability for solving complex engineering problems and modeling largescale systems. The proposed approach can be utilized in structural mechanics, applied physics, and other fields of engineering analysis requiring the processing of large data volumes. The conclusions of the study are valuable for further development of parallel programming methods and improving the efficiency of computational systems. Keywords: high-performance computing, system of linear algebraic equations, Cholesky method, sparse matrices, block skyscraper matrices, parallel algorithms, systems with shared memory.

Similar Papers
  • Research Article
  • 10.12694/scpe.v5i2.274
Parallel Numeric Algorithms On Faster Computers
  • Jan 3, 2001
  • Scalable Computing Practice and Experience
  • Roman Trobec

High performance parallel computers provide the computational rates necessary for computer simulations based on parallel numerical methods. Large-scale natural phenomena, experiments that would cost vast amounts of money, and those that are ecologically problematic or dangerous, can be first simulated in order to foresee the expected results. There are two possibilities to increase computational performance; either by increasing the performance of a single processor or by increasing the number of processors. Nowdays it seems that the advance in performance of a single processor cannot follow Moor's prediction that computer performance will double every two years. The slower rate is being temporarily compensated by the increasing number of processors that can cooperate on the same problem. Today, it is not unusual to utilise parallel machines with several thousand computers. But are such huge parallel machines able to execute numerical algorithms efficiently? It is known that only scalable algorithms can be implemented efficiently on a massively parallel machine. Typically, such algorithms need a very small amount of cooperation, i.e., communication between processors. However, a huge number of processors cannot be exploited adequately if the problem size remains independent of the available processors. New parallel algorithms are being published that can be implemented only on a limited number of processors because an even distribution of computational load can not be achieved. In such cases the speed-up is limited. These algorithms are of a marginal value for practical use, because their parallelization efforts can be compensated by faster computers. If the theoretical speed-up of some improved parallel algorithm is limited, which is often the case, then it would be better to use a faster computer. Coding, testing and documenting a new algorithm, which offers only a small speed-up, is inefficient. Although a smarter algorithm may have lower asymptotical complexity, the multiplicative constant is usually higher because: it typically performs more operations per iteration (e.g. iterative methods for linear equation systems), it uses more memory, resulting in more cache misses, it sometimes has a more complicated control flow, making it harder for the compiler to produce an optimal assembly code and for the CPU to avoid pipeline stalls. Therefore algorithm development must take into account the current situation and predicted further developments of CPUs and compilers. Any algorithmic complications also make changes to programs in the future harder. On the other hand, as new algorithms become widely used, they find their way into numerical libraries such as BLAS, POOMA and others. After that the use of state-of-the-art algorithms requires little more effort than standard algorithms. Again, widely used applications can typically use these libraries because they were designed with such calculations in mind. Special purpose programs often have special demands that the above mentioned libraries cannot meet. Numerical algorithms that are good for manual calculation are not always suitable for computers (complicated coding, memory requirements, etc.). So algorithms with balanced computational and memory requirements can be advocated in numerical analysis, because they have the potential to be effectively parallelized. Both ideas, scalability and balanced computational and memory requirements, have to be considered in the development of new numerical algorithms and in potential reconsideration of the standard algorithms. Besides, the effort needed to code a new improved algorithm has to be taken into account. Finally, the number of expected computer experiments has to be foreseen. If, for example, only a small number of experiments are expected, then we can take an old and potentially slow algorithm. If its execution time is unacceptably long, then something new, usually based on compromises, has to be developed. The time needed to finish a computer experiment is one factor that has to be considered in our decision about investment in development of a new parallel algorithm. On the other hand, the number of expected runs of such an experiment is even more important. For example, if we expect only a single or very small number of runs, then we can usually use an existing old algorithm, particularly if the execution time is rather short, say no more then 24 hours or sometimes up to 240 hours. There is no problem at all with shorter experiments. But if we expect the several thousand runs to be necessary, then the strategy should be changed. Many experiments can be run in parallel. Parallelization of such a set of independent experiments is usually simple and the benefits can be enormous. In CHARMM (a widely used simulation programme in structural chemistry), for example, even a 5% speed-up is probably worth a few thousand programming hours, while custom programs that are used only within an organization are another matter. The general complex algorithms needed by many users or that, which form a part of standard parallel libraries, remain the main problem. They have to be coded with great caution in order to be implemented in the best possible way to offer near ideal speed-up for any number of processors. Research work should be concentrated on this issue. Eficient system programs and tools are needed for further development of parallel programs. Ordinary users do not need to know anything about parallelization; they just want to perform complex experiments in as short a time as possible. Not only computational performance but also memory requirements play an important role in the parallelization strategy. Sometimes, enough memory or cache has a greater impact on the final performance than the precise coding of a parallel program. The essential fact here is that parallel memory access will improve the performance of the parallel program. Data should be distributed in such a way that permits an even distribution of computation, enabling the use of virtually the same program on all machines regardless of memory type. Such an approach usually leads to an almost ideal speed-up and therefore to applications appropriate for massively parallel machines. The final issue that has to be considered are the ideas covered by autonomous computing , i.e., self-organization, fault-tolerance, etc. It implicitly assumes distributed and parallel systems. While distributed computing may use fast connected computers, the parallel computers are composed of fast computers connected with equally fast communication media. Comparing these two types of machines we recognise that they need different treatment. Distributed systems or grids are able to perform only ideally scalable algorithms with virtually no communication, while parallel computers have the potential to perform complicated algorithms with high communication requirements. It is usually supposed that processor performances have to be improved in order to improve the overall performances of parallel algorithms. But suppose that a technological breakthrough were to be made in the communication sector. Suppose that communication speed would become virtually unlimited. In this case the computer performance achieved today would become suficient and the effort needed to be invested in parallelization tools would become much smaller than in the case of fast processors. It seems that research of fast communication methods would be more or at least equally efficient than that focused on increasing processor speed. If we speculate on future developments in the area of parallel and distributed computing we may conclude that there will be no unique computer design covering all user requirements. Most of us will use home and mobile computers with simple access to different types of specialized high performance computers with fast response for a price comparable with the price of a phone call. Some of us will work on the system software for these specialized computers, probably within several non-standard frameworks. Then, something will happen that will initiate the next cycle in the development of the next generation of thinking machines … Roman Trobec Jozef Stefan Institute, Slovenia

  • Research Article
  • Cite Count Icon 9
  • 10.12694/scpe.v2i4.155
Parallel sparse matrix algorithms for air pollution models
  • Jan 1, 2001
  • Scalable Computing Practice and Experience
  • Krassimir Georgiev + 1 more

Mathematical models are indispensable tools in different environmental studies. Such models are usually described by systems of partial differential equations (PDE's). The number of equations in the PDE system is equal to the number of the chemical species involved in the model. The application of different discretization and splitting techniques transforms the system of partial differential equations into five very large systems of ordinary differential equations (ODE's). The treatment of the ODE systems leads to the solution of several large systems of linear algebraic equations at every time-step. Sparse matrix technique can be used in order to reduce the number of arithmetic operations. The efficiency can be further increased by applying parallel algorithms. The use of a special sparse matrix algorithm and the parallelization of the computational process will be discussed in this paper.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.parco.2013.03.003
A fast parallel algorithm for solving block-tridiagonal systems of linear equations including the domain decomposition method
  • Jun 1, 2013
  • Parallel Computing
  • Andrew V Terekhov

A fast parallel algorithm for solving block-tridiagonal systems of linear equations including the domain decomposition method

  • Research Article
  • Cite Count Icon 3
  • 10.1016/0010-4655(81)90092-8
Recurrence solution of a block tridiagonal matrix equation with Neumann, Dirichlet, mixed or periodic boundary conditions
  • Nov 1, 1981
  • Computer Physics Communications
  • F Marsh + 1 more

Recurrence solution of a block tridiagonal matrix equation with Neumann, Dirichlet, mixed or periodic boundary conditions

  • Book Chapter
  • 10.1007/978-3-030-64616-5_11
Resource-Efficient Parallel CG Algorithms for Linear Systems Solving on Heterogeneous Platforms
  • Jan 1, 2020
  • Nikita S Nedozhogin + 2 more

The article discusses the parallel implementation of solving systems of linear algebraic equations on the heterogeneous platform containing a central processing unit (CPU) and graphic accelerators (GPU). The performance of parallel algorithms for the classical conjugate gradient method schemes when using the CPU and GPU together is significantly limited by the synchronization points. The article investigates the pipeline version of the conjugate gradient method with one synchronization point, the possibility of asynchronous calculations, load balancing between the CPU and GPU when solving the large linear systems. Numerical experiments were carried out on test matrices and computational nodes of different performance of a heterogeneous platform, which allowed us to estimate the contribution of communication costs. The algorithms are implemented with the combined use of technologies: MPI, OpenMP and CUDA. The proposed algorithms, in addition to reducing the execution time, allow solving large linear systems, for which there are not enough memory resources of one GPU or a computing node. At the same time, block algorithm with the pipelining decreases the total execution time by reducing synchronization points and aggregating some messages in one.

  • Research Article
  • 10.24891/fc.30.1.124
Solving systems of homogeneous linear algebraic equations in pricing problems of an alternative financial model
  • Jan 30, 2024
  • Finance and Credit
  • Sergei S Kuznetsov + 4 more

Subject. This article focuses on the problems of solving systems of homogeneous linear algebraic equations in the pricing problems of an alternative financial model. Objectives. The article aims to develop a non-trivial approach to solving the problem of pricing from the scratch in the construction of a national alternative financial model, which boils down to solving large homogeneous systems of linear algebraic equations, complicated by the fuzzy setting of initial parameters (initial data) within the framework of creating a State model of a socially oriented economy, guaranteed to ensure economic, military and other types of integrated security of the Russian Federation. Methods. For the study, we used tensorial notations involving the use of elements of linear algebra, including operations with matrices and vectors, the apparatus of the fuzzy set theory (interval arithmetic), as well as methods and principles of scientific research and complex logical analysis and modeling of pricing and technological processes. Results. The article shows that an effective non-zero (different from zero, i.e. classical, trivial) solution of systems of homogeneous algebraic equations in pricing problems of an alternative financial model from the scratch is possible if the initial parameters are specified by fuzzy sets, which significantly simplifies the procedure of the computational process and the volume of arithmetic operations. Conclusions. The results obtained can be used in structures involved in the creation and testing of financial and economic models both at the federal and the Russian Federation constituent entity levels, as well as in governmental and non-governmental organizations dealing with the issues of pricing, planning and logistics support for business entities not only in the Russian Federation, but also in other countries experiencing problems in the field of financing.

  • Conference Article
  • 10.1063/1.5007422
High-performance computing on GPUs for resistivity logging of oil and gas wells
  • Jan 1, 2017
  • V Glinskikh + 3 more

We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.proeng.2015.12.067
Simulation of Electrical Circuits Using Conjugate Gradient Algorithm
  • Jan 1, 2015
  • Procedia Engineering
  • Y.A Burtsev

Simulation of Electrical Circuits Using Conjugate Gradient Algorithm

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.24027/2306-7039.1.2021.228201
Particular properties of estimation of partial capacitances of insulation of three core power cables by applying aggregate measurements
  • Mar 31, 2021
  • Ukrainian Metrological Journal
  • I A Kostiukov

This paper presents a description of specific properties of determining the values of partial capacitances of insulation gaps in power cables with paper insulation for various ways of forming and solving the system of linear algebraic equations. Possible ways of inspection the insulation of three core power cables for the estimation of values of partial capacitances by applying aggregate measurements which are based on various ways of connection of emittance meter to tested sample of power cable are given. Estimation of partial capacitances by the direct solution of a system of linear algebraic equations, by minimizing the root mean square error of solving an overdetermined system of equations by the least squares method, as well as by finding a normal solution of an indefinite system of equations by the pseudo-inverse matrix, is also considered. It is shown that minimization of the root mean square error by the least squares method and the direct solution of system of equations show quite similar results for the case of estimation of partial capacitances by means of aggregate measurements, at the same time the solution of an indefinite system of equations by the method of a pseudo-inverted matrix allows to reproduce rather accurately only 3 out of 6 values of partial capacitances. The uneven effect of frequency on the electrical capacitance of the insulation gaps between the cores of the power cable and between its cores and the sheath is shown. It was proposed to use the frequency dependence of the electrical capacitance of insulation gaps as an informative parameter about the technical state of insulating gaps between the cores of the power cable and between its cores and its sheath.
 
 Keywords: root mean square error; least squares method; system of linear algebraic equations; dielectric losses; dielectric permittivity.

  • Research Article
  • 10.36994/2707-4110-2019-1-22-27
Modern computer technologies for modeling the life cycle of buildings based on parallel calculations
  • Jan 1, 2019
  • Visnyk Universytetu “Ukraina”

The article deals with high-performance information technology (HPC) for the problems of stress-strain analysis at all stages of the life cycle of buildings and structures: construction, operation and reconstruction. The results of numerical simulation of high buildings using software as a processor component based on a new hybrid algorithm for solving systems of linear algebraic equations [1] with a symmetric positive-definite matrix that combines computation on multi-core processors and graphs. It has been found that to accelerate the calculations, hybrid systems that combine multi-core CPUs with accelerator coprocessors, including GPUs, are promising [5]. To test the effectiveness of the proposed parallel algorithm for solving systems of linear algebraic equations [1], numerical experiments were carried out at the most dangerous loads of a 27-story building. Results of numerical researches with use for preprocessor (input of initial data) and postprocessor (output of results of calculations) of processing of the LIRA-SAPR software complex are presented [2, 4, 6]. The results of numerical studies of the behavior of structures of high buildings have shown a multiple reduction in the time of solving systems of linear algebraic equations with symmetric matrices on multiprocessor (multi-core) computers with graphical accelerators using the proposed hybrid algorithms [1]. High-performance technologies based on parallel calculations give more effect than more complex processes: modeling of life cycle of high buildings, bridges, especially complex structures of NPPs, etc. for static and dynamic loads, including emergencies in normal and difficult geological conditions, which make up 70% of Ukraine's territories.

  • PDF Download Icon
  • Research Article
  • 10.14529/cmse200203
Параллельное решение систем линейных уравнений на гибридной архитектуре CPU+GPU
  • May 1, 2020
  • Bulletin of the South Ural State University. Series "Computational Mathematics and Software Engineering"
  • С Недожогин + 3 more

The article discusses the parallel implementation of solving systems of linear algebraic equations on computational nodes containing a central processing unit (CPU) and graphic accelerators (GPU). The performance of parallel algorithms for the classical conjugate gradient method schemes when using the CPU and GPU together is significantly limited by the synchronization points. The article investigates the pipeline version of the conjugate gradient method with one synchronization point, the possibility of asynchronous calculations, load balancing between the CPU and GPU when solving the large linear systems. Numerical experiments were carried out on test matrices and computational nodes of different performance of a heterogeneous cluster, which allowed us to estimate the contribution of communication costs. The algorithms are implemented with the joint use of technologies: MPI, OpenMP and CUDA. The proposed algorithms, in addition to reducing the execution time, allow solving large linear systems, for which there are not enough memory resources of one GPU or a computing node. At the same time, block algorithm with the pipelining decreases the total execution time by reducing synchronization points and aggregating some messages in one.

  • Research Article
  • Cite Count Icon 2
  • 10.15407/pp2020.02-03.208
Гібридний алгоритм методу Ньютона для розв’язування систем нелінійних рівнянь з блочними матрицями Якобі
  • Sep 1, 2020
  • PROBLEMS IN PROGRAMMING
  • O.M Khimich + 2 more

Systems of nonlinear equations often arise when modeling processes of different nature. These can be both independent problems describing physical processes and also problems arising at the intermediate stage of solving more complex mathematical problems. Usually, these are high-order tasks with the big count of un-knows, that better take into account the local features of the process or the things that are modeled. In addition, more accurate discrete models allow for more accurate solutions. Usually, the matrices of such problems have a sparse structure. Often the structure of sparse matrices is one of next: band, profile, block-diagonal with bordering, etc. In many cases, the matrices of the discrete problems are symmetric and positively defined or half-defined. The solution of systems of nonlinear equations is performed mainly by iterative methods based on the Newton method, which has a high convergence rate (quadratic) near the solution, provided that the initial approximation lies in the area of gravity of the solution. In this case, the method requires, at each iteration, to calculates the Jacobi matrix and to further solving systems of linear algebraic equations. As a consequence, the complexity of one iteration is. Using the parallel computations in the step of the solving of systems of linear algebraic equations greatly accelerates the process of finding the solution of systems of nonlinear equations. In the paper, a new method for solving systems of nonlinear high-order equations with the Jacobi block matrix is proposed. The basis of the new method is to combine the classical algorithm of the Newton method with an efficient small-tile algorithm for solving systems of linear equations with sparse matrices. The times of solving the systems of nonlinear equations of different orders on the nodes of the SKIT supercomputer are given.

  • Research Article
  • Cite Count Icon 2
  • 10.34229/2707-451x.20.2.6
Parallel Algorithms for Solving Linear Systems on Hybrid Computers
  • Jul 24, 2020
  • Cybernetics and Computer Technologies
  • Alexander Khimich + 2 more

Introduction. At present, in science and technology, new computational problems constantly arise with large volumes of data, the solution of which requires the use of powerful supercomputers. Most of these problems come down to solving systems of linear algebraic equations (SLAE). The main problem of solving problems on a computer is to obtain reliable solutions with minimal computing resources. However, the problem that is solved on a computer always contains approximate data regarding the original task (due to errors in the initial data, errors when entering numerical data into the computer, etc.). Thus, the mathematical properties of a computer problem can differ significantly from the properties of the original problem. It is necessary to solve problems taking into account approximate data and analyze computer results. Despite the significant results of research in the field of linear algebra, work in the direction of overcoming the existing problems of computer solving problems with approximate data is further aggravated by the use of contemporary supercomputers, do not lose their significance and require further development. Today, the most high-performance supercomputers are parallel ones with graphic processors. The architectural and technological features of these computers make it possible to significantly increase the efficiency of solving problems of large volumes at relatively low energy costs. The purpose of the article is to develop new parallel algorithms for solving systems of linear algebraic equations with approximate data on supercomputers with graphic processors that implement the automatic adjustment of the algorithms to the effective computer architecture and the mathematical properties of the problem, identified in the computer, as well with estimates of the reliability of the results. Results. A methodology for creating parallel algorithms for supercomputers with graphic processors that implement the study of the mathematical properties of linear systems with approximate data and the algorithms with the analysis of the reliability of the results are described. The results of computational experiments on the SKIT-4 supercomputer are presented. Conclusions. Parallel algorithms have been created for investigating and solving linear systems with approximate data on supercomputers with graphic processors. Numerical experiments with the new algorithms showed a significant acceleration of calculations with a guarantee of the reliability of the results. Keywords: systems of linear algebraic equations, hybrid algorithm, approximate data, reliability of the results, GPU computers.

  • Conference Article
  • Cite Count Icon 1
  • 10.2514/6.1977-644
A fast implicit iterative numerical method for solving multidimensional partial differential equations
  • Jun 27, 1977
  • W Helliwell

A new technique for solving the large system of linear algebraic equations associated with implicit differencing of multidimensio nal partial differential equations is presented. The coefficient matrix of the equations is factored, and then approximations to certain terms in the matrix are obtained from series expansions. The resulting system of equations is solved easily. The method is developed and demonstrated using a simple representative two-dimensional equation. Very good results are obtained when one direction is dominant. MPLICIT finite-difference schemes for the solution of multidimensional partial differential equations are usually stable and therefore applicable to a large class of problems. However, they are difficult to implement and may require an excessive amount of computer storage and time. The long computing time arises from the need to solve the large system of linear algebraic equations that result from the differencing. The computing time can be reduced significantly by approximating the coefficient matrix of the linear equations with a matrix that produces a system of equations that are relatively easy to solve. Among such methods are the alternating direction method (ADI)l used by Beam and Warming2 and Stone's strongly implicit method,3 which has been tested by Linetal.4 In this paper, a new technique for solving the large system of linear algebraic equations associated with implict differencing of multidimensional partial differential equations is presented. This method, called the pseudo-elimination method (PE), is shown to be faster than Stone's method for certain problems. The method is directly applicable to linear and linearized nonlinear systems of parabolic or elliptic partial differential equations. In order to discuss the method, a simple linear partial differential equation will be used; however, it should be kept in mind that the PE method is applicable to much more complicated problems. The question of whether the method will work when applied to difficult problems is not addressed. The scope of this paper is limited to presenting the method and illustrating, via a simple problem, that the method has some merit and deserves further study.

  • Research Article
  • Cite Count Icon 3
  • 10.2514/3.60955
A Fast Implicit Iterative Numerical Method for Solving Multidimensional Partial Differential Equations
  • Jul 1, 1978
  • AIAA Journal
  • William S Helliwell

A new technique for solving the large system of linear algebraic equations associated with implicit differencing of multidimensio nal partial differential equations is presented. The coefficient matrix of the equations is factored, and then approximations to certain terms in the matrix are obtained from series expansions. The resulting system of equations is solved easily. The method is developed and demonstrated using a simple representative two-dimensional equation. Very good results are obtained when one direction is dominant. MPLICIT finite-difference schemes for the solution of multidimensional partial differential equations are usually stable and therefore applicable to a large class of problems. However, they are difficult to implement and may require an excessive amount of computer storage and time. The long computing time arises from the need to solve the large system of linear algebraic equations that result from the differencing. The computing time can be reduced significantly by approximating the coefficient matrix of the linear equations with a matrix that produces a system of equations that are relatively easy to solve. Among such methods are the alternating direction method (ADI)l used by Beam and Warming2 and Stone's strongly implicit method,3 which has been tested by Linetal.4 In this paper, a new technique for solving the large system of linear algebraic equations associated with implict differencing of multidimensional partial differential equations is presented. This method, called the pseudo-elimination method (PE), is shown to be faster than Stone's method for certain problems. The method is directly applicable to linear and linearized nonlinear systems of parabolic or elliptic partial differential equations. In order to discuss the method, a simple linear partial differential equation will be used; however, it should be kept in mind that the PE method is applicable to much more complicated problems. The question of whether the method will work when applied to difficult problems is not addressed. The scope of this paper is limited to presenting the method and illustrating, via a simple problem, that the method has some merit and deserves further study.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.