We provide an updated version of the program hex-ecs originally presented in Comput. Phys. Commun. 185 (2014) 2903–2912. The original version used an iterative method preconditioned by the incomplete LU factorization (ILU), which–though very stable and predictable–requires a large amount of working memory. In the new version we implemented a “separated electrons” (or “Kronecker product approximation”, KPA) preconditioner as suggested by Bar-On et al., Appl. Num. Math. 33 (2000) 95–104. This preconditioner has much lower memory requirements, though in return it requires more iterations to reach converged results. By careful choice between ILU and KPA preconditioners one is able to extend the computational feasibility to larger calculations.Secondly, we added the option to run the KPA preconditioner on an OpenCL device (e.g. GPU). GPUs have generally better memory access times, which speeds up particularly the sparse matrix multiplication. New version program summaryProgram title: hex-ecsCatalogue identifier: AETI_v2_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETI_v2_0.htmlProgram obtainable from: CPC Program Library, Queen’s University, Belfast, N. IrelandLicensing provisions: MIT LicenseNo. of lines in distributed program, including test data, etc.: 73693No. of bytes in distributed program, including test data, etc.: 520475Distribution format: tar.gzProgramming language: C++11.Computer: Any recent CPU, preferably 64-bit. Computationally intensive parts can be run on GPU (tested on AMD Tahiti and NVidia TitanX models).Operating system: Tested on Windows 10 and various Linux distributions.RAM: Depends on the problem solved and particular setup; KPA test run uses apx. 300 MiB.Classification: 2.4.Catalogue identifier of previous version: AETI_v2_0Journal reference of previous version: Comput. Phys. Comm. 185 (2014) 2903External routines: GSL [1], UMFPACK [2], BLAS and LAPACK (ideally threaded OpenBLAS [3]).Does the new version supersede the previous version?: YesNature of problem: Solution of the two-particle Schrödinger equation in central field.Solution method: The two-electron states are expanded into angular momentum eigenstates, which gives rise to the coupled bi-radial equations. The bi-radially dependent solution is then represented in a B-spline product basis, which transforms the set of equations into a large matrix equation in this basis. The boundary condition is of Dirichlet type, thanks to the use of the exterior complex scaling method, which extends the coordinates into the complex plane. The matrix equation is then solved by preconditioned conjugated orthogonal conjugate gradient method (PCOCG) [4].Reasons for new version: The original program has been updated to achieve better performance. Also, some external dependencies have been removed (HDF5, FFTW3), which simplifies deployment.Summary of revisions: We implemented a new preconditioner introduced in [5], both for general CPU and also for an arbitrary OpenCL device (e.g. GPU) conforming to the OpenCL 2.0 specification. Furthermore, many other minor improvements have been made, particularly with the intention of reducing the memory requirements. With appropriate switches the program now does not precompute the used matrices and only calculates their elements on the fly. This is aided also by the vectorized B-spline evaluation function, which can now make use of AVX instructions when a single B-spline is being evaluated at several points. The accompanying tools hex-db and hex-dwba [6] have been also updated to use the shared code base.Running time: KPA test run — apx. 2 minutes on Intel i7-4790K (4 threads)
Read full abstract