Speedup Research Articles

An algorithm is presented for the coupled-cluster singles, doubles, and perturbative triples correction [CCSD(T)] method based on the density fitting or the resolution-of-the-identity (RI) approximation for performing calculations on heterogeneous computing platforms composed of multicore CPUs and graphics processing units (GPUs). The directive-based approach to GPU offloading offered by the OpenMP application programming interface has been employed to adapt the most compute-intensive terms in the RI-CCSD amplitude equations with computational costs scaling as , , and (where NO and NV denote the numbers of correlated occupied and virtual orbitals, respectively) and the perturbative triples correction to execute on GPU architectures. The pertinent tensor contractions are performed using an accelerated math library such as cuBLAS or hipBLAS. Optimal strategies are discussed for splitting large data arrays into tiles to fit them into the relatively small memory space of the GPUs, while also minimizing the low-bandwidth CPU-GPU data transfers. The performance of the hybrid CPU-GPU RI-CCSD(T) code is demonstrated on pre-exascale supercomputers composed of heterogeneous nodes equipped with NVIDIA Tesla V100 and A100 GPUs and on the world's first exascale supercomputer named "Frontier", the nodes of which consist of AMD MI250X GPUs. Speedups within the range 4-8× relative to the recently reported CPU-only algorithm are obtained for the GPU-offloaded terms in the RI-CCSD amplitude equations. Applications to polycyclic aromatic hydrocarbons containing 16-66 carbon atoms demonstrate that the acceleration of the hybrid CPU-GPU code for the perturbative triples correction relative to the CPU-only code increases with the molecule size, attaining a speedup of 5.7× for the largest circumovalene molecule (C66H20). The GPU-offloaded code enables the computation of the perturbative triples correction for the C60 molecule using the cc-pVDZ/aug-cc-pVTZ-RI basis sets in 7 min on Frontier when using 12,288 AMD GPUs with a parallel efficiency of 83.1%.

Read full abstract

BackgroundSimulating the cardiac function requires the numerical solution of multi-physics and multi-scale mathematical models. This underscores the need for streamlined, accurate, and high-performance computational tools. Despite the dedicated endeavors of various research teams, comprehensive and user-friendly software programs for cardiac simulations, capable of accurately replicating both normal and pathological conditions, are still in the process of achieving full maturity within the scientific community.ResultsThis work introduces texttt {life}^{text{x}}-ep, a publicly available software for numerical simulations of the electrophysiology activity of the cardiac muscle, under both normal and pathological conditions. texttt {life}^{text{x}}-ep employs the monodomain equation to model the heart’s electrical activity. It incorporates both phenomenological and second-generation ionic models. These models are discretized using the Finite Element method on tetrahedral or hexahedral meshes. Additionally, texttt {life}^{text{x}}-ep integrates the generation of myocardial fibers based on Laplace–Dirichlet Rule-Based Methods, previously released in Africa et al., 2023, within texttt {life}^{text{x}}-fiber. As an alternative, users can also choose to import myofibers from a file. This paper provides a concise overview of the mathematical models and numerical methods underlying texttt {life}^{text{x}}-ep, along with comprehensive implementation details and instructions for users. texttt {life}^{text{x}}-ep features exceptional parallel speedup, scaling efficiently when using up to thousands of cores, and its implementation has been verified against an established benchmark problem for computational electrophysiology. We showcase the key features of texttt {life}^{text{x}}-ep through various idealized and realistic simulations conducted in both normal and pathological scenarios. Furthermore, the software offers a user-friendly and flexible interface, simplifying the setup of simulations using self-documenting parameter files.Conclusionstexttt {life}^{text{x}}-ep provides easy access to cardiac electrophysiology simulations for a wide user community. It offers a computational tool that integrates models and accurate methods for simulating cardiac electrophysiology within a high-performance framework, while maintaining a user-friendly interface. texttt {life}^{text{x}}-ep represents a valuable tool for conducting in silico patient-specific simulations.

Read full abstract

Speedup Research Articles

Related Topics

Articles published on Speedup

Refactoring BZIP2 on the new‐generation sunway supercomputer

AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks

Dimensioning the pending interest table in content-centric networks

Local Second-Order Møller-Plesset Theory with a Single Threshold Using Orthogonal Virtual Orbitals: Theory, Implementation, and Assessment.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

Virtual lattice method for efficient Monte Carlo transport simulation of dispersion nuclear fuels

A massive MPI parallel framework of smoothed particle hydrodynamics with optimized memory management for extreme mechanics problems

An improved dynamical Poisson equation solver for self-gravity

Lifex-ep: a robust and efficient software for cardiac electrophysiology simulations

Two different parallel approaches for a hybrid fractional order Coronavirus model

Enhanced Ant Colony Algorithm for Discrete Dynamic Berth Allocation in a Case Container Terminal

Analysis of the Influence Path of Confucianism in the Civic Education of Contemporary College Students in the Context of Big Data

Fast parallel IGA-ADS solver for time-dependent Maxwell's equations

Computational performance of musculoskeletal simulation in OpenSim Moco using parallel computing.

Design and Development of Effective Multi-Level Cache Memory Model

Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations

Approximating Inverse Cumulative Distribution Functions to Produce Approximate Random Variables

Exploring model complexity in machine learned potentials for simulated properties

A fast, dense Chebyshev solver for electronic structure on GPUs.

Domain pattern formation in tetragonal ferroelectric ceramics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speedup Research Articles

Related Topics

Articles published on Speedup

Refactoring BZIP2 on the new‐generation sunway supercomputer

AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks

Dimensioning the pending interest table in content-centric networks

Local Second-Order Møller-Plesset Theory with a Single Threshold Using Orthogonal Virtual Orbitals: Theory, Implementation, and Assessment.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

Virtual lattice method for efficient Monte Carlo transport simulation of dispersion nuclear fuels

A massive MPI parallel framework of smoothed particle hydrodynamics with optimized memory management for extreme mechanics problems

An improved dynamical Poisson equation solver for self-gravity

Lifex-ep: a robust and efficient software for cardiac electrophysiology simulations

Two different parallel approaches for a hybrid fractional order Coronavirus model

Enhanced Ant Colony Algorithm for Discrete Dynamic Berth Allocation in a Case Container Terminal

Analysis of the Influence Path of Confucianism in the Civic Education of Contemporary College Students in the Context of Big Data

Fast parallel IGA-ADS solver for time-dependent Maxwell's equations

Computational performance of musculoskeletal simulation in OpenSim Moco using parallel computing.

Design and Development of Effective Multi-Level Cache Memory Model

Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations

Approximating Inverse Cumulative Distribution Functions to Produce Approximate Random Variables

Exploring model complexity in machine learned potentials for simulated properties

A fast, dense Chebyshev solver for electronic structure on GPUs.

Domain pattern formation in tetragonal ferroelectric ceramics