In this updated version of ZMCintegral, we have added the functionality of parameter scan for integrations with a large parameter space (up to 1010 points to be scanned). The Python API is kept the same as the previous ones and users have full flexibility to define their own integrands. The performance of the new functionality is tested for multi-nodes conditions. Program summaryProgram Title: ZMCintegralProgram Files doi:http://dx.doi.org/10.17632/p7wc7k6mpp.2Licensing provisions: Apache-2.0Programming language: PythonJournal reference of previous version: Comput. Phys. Commun. 248 (2020) 106962Does the new version supersede the previous version?: YesReasons for the new version: For many physical cases, one usually encounters integrations containing a large parameter space [1,2,3,4]. In these conditions, integrations of the same form (but with different parameters) need be evaluated for millions of times. However, our previous versions [5] mainly focus on a single integration of high dimension, hence lack the good performance for many integrations of relatively high dimensions (e.g., the scan of parameter space). Therefore, we have added the demanding functionality of parameter scan. For each point in the parameter space, an integration will be evaluated, and all these integrations will be allocated automatically.Summary of revisions:•Integration with parameter space. We use the six-dimensional integration (1)f(x1,x2,x3,x4)=(∏k=16∫01dyk)sin(∑j6yj+∑l=14xl),to demonstrate the functionality of parameter scan. Here, the component of each parameter point xl∈{x1,x2,x3,x4} takes values in list [0,1,2,⋯,99], thus forming a parameter space of size 1004 (i.e., 108 points to be scanned). The allocation of the thread and block configurations will be handled by the code automatically. •Test on multiple (three) nodes. The hardware conditions for the three nodes are Intel(R) Xeon(R) CPU E5-2620 v3@2.40 GHz CPU with 24 processors + 4 Nvidia Tesla K40m GPUs, Intel(R) Xeon(R) CPU E5-2680 V4@2.40 GHz CPU with 10 processors + 2 Nvidia Tesla K80 GPUs, and Intel(R) Xeon(R) Silver 4110 CPU@2.10 GHz CPU with 10 processors + 1 Nvidia Tesla V100 GPU. These three nodes are in a local area network. We perform the integration of Eq. (1) independently for 10 times. The results are shown in Table 1. Note that the accuracy of the integrations purely depends on the number of samples in direct Monte Carlo method.Nature of problem: ZMCintegral is an easy to use Python package for doing high dimensional integrations on distributed GPU clusters. With the Python libraries Numba [6] and Ray [7], as well as the NVIDIA CUDA [8], ZMCintegral provides a succinct Python interface for evaluating numerical integrations in physical problems. In this new version, we have mainly focused on one kind of problem where the integration contains a large parameter space (up to 1010 parameter points).Solution method: This new version of ZMCintegral contains two Python classes. One class (ZMCintegral_normal, similar to our previous version), which is for high-dimensional integration, uses stratified-sampling and heuristic-tree-search technics. The other class (ZMCintegral_functional, the new functionality) uses direct Monte Carlo method and distributes the tasks of integrations (with each containing a different parameter point) on various devices automatically. In ZMCintegral_functional, the integration is performed on a single GPU thread and each thread gives the result of the integration for one specific parameter point.Additional comments: If the integrations are high-dimensional (e.g. dimensionality of 8–12), we encourage users to use ZMCintegral_normal. If the integrations are middle-dimensional (e.g. dimensionality of 1-7) but have a large parameter space, we suggest users to try ZMCintegral_functional. The detailed instructions can be found here: [9]. Declaration of Competing InterestThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment The authors are supported in part by the Major State Basic Research Development Program (973 Program) in China under Grant No. 2015CB856902 and by the National Natural Science Foundation of China (NSFC) under Grant No. 11535012. The Computations are performed at the GPU servers of department of modern physics at USTC. We are thankful for the valuable discussions with Prof. Qun Wang of department of modern physics at USTC.
Read full abstract