Abstract

A completely integral-direct, disk I/O, and network traffic economic coupled-cluster singles, doubles, and perturbative triples [CCSD(T)] implementation has been developed relying on the density-fitting approximation. By fully exploiting the permutational symmetry, the presented algorithm is highly operation count and memory-efficient. Our measurements demonstrate excellent strong scaling achieved via hybrid MPI/OpenMP parallelization and a highly competitive, 60-70% utilization of the theoretical peak performance on up to hundreds of cores. The terms whose evaluation time becomes significant only for small- to medium-sized examples have also been extensively optimized. Consequently, high performance is also expected for systems appearing in extensive data sets used, e.g., for density functional or machine learning parametrizations, and in calculations required for certain reduced-cost or local approximations of CCSD(T), such as in our local natural orbital scheme [LNO-CCSD(T)]. The efficiency of this implementation allowed us to perform some of the largest CCSD(T) calculations ever presented for systems of 31-43 atoms and 1037-1569 orbitals using only four to eight many-core CPUs and 1-3 days of wall time. The resulting 13 correlation energies and the 12 corresponding reaction energies and barrier heights are added to our previous benchmark set collecting reference CCSD(T) results of molecules at the applicability limit of current implementations.

Highlights

  • The coupled-cluster (CC)[1−4] family of methods has become one of the most accurate and versatile theoretical tools to simulate molecules and solids at the atomic scale

  • For a significant portion of the target use cases, when the virtual/occupied ratio is in the range of 5−10, the usual assumption that the 6(nv4no2/4)-scaling particle−particle ladder (PPL) term dominates the cost of CC model with single and double excitations (CCSD) does not hold

  • We improved upon a previous t1-transformed CCSD algorithm,[28] for instance, by optimizing and parallelizing all contractions besides the usually emphasized particle−particle ladder term

Read more

Summary

INTRODUCTION

The coupled-cluster (CC)[1−4] family of methods has become one of the most accurate and versatile theoretical tools to simulate molecules and solids at the atomic scale. Recent cost reduction efforts considered promising ideas utilizing, for example, sparsity exploitation,[45] mixed single and double precision operations,[46] or stochastic approaches.[47,48] The most successful methods to date combine multiple strategies, such as DF and NOs, with local approximations.[21,49−51] For instance, we have recently demonstrated with our local natural orbital (LNO)[21,52−57] scheme that, while retaining high accuracy, LNO-CCSD(T) calculations can be performed up to a few thousand atoms and 45 000 orbitals even with a single CPU.[21,57] As CC methods with local and NO approximations become increasingly accepted and trusted in the literature,[58] tightly converged approximations have mostly taken over the role of massively parallel implementations in large-scale CCSD(T) applications Considering this shift, we identify three use cases and the corresponding algorithmic properties for which our optimization efforts are aimed at.

THEORETICAL BACKGROUND
ALGORITHM
CCSD Algorithm
COMPUTATIONAL DETAILS
PERFORMANCE ANALYSIS
APPLICATIONS
SUMMARY AND OUTLOOK
■ APPENDIX
■ ACKNOWLEDGMENTS
Findings
■ REFERENCES
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call