We present an adaptive Monte Carlo algorithm for computing the amplified spontaneous emission (ASE) flux in laser gain media pumped by pulsed lasers. With the design of high power lasers in mind, which require large size gain media, we have developed the open source code HASEonGPU that is capable of utilizing multiple graphic processing units (GPUs). With HASEonGPU, time to solution is reduced to minutes on a medium size GPU cluster of 64 NVIDIA Tesla K20m GPUs and excellent speedup is achieved when scaling to multiple GPUs. Comparison of simulation results to measurements of ASE in Y b3+:Y AG ceramics show perfect agreement. Program summaryProgram title: HASEonGPUCatalogue identifier: AFAM_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFAM_v1_0.htmlProgram obtainable from: CPC Program Library, Queen’s University, Belfast, N. IrelandLicensing provisions: GNU General Public License, version 3No. of lines in distributed program, including test data, etc.: 84610No. of bytes in distributed program, including test data, etc.: 3791861Distribution format: tar.gzProgramming language: C++, Matlab.Computer: GPU cluster or workstation with CUDA-capable GPUs (compute capability ≥2.0).Operating system: Linux.Has the code been vectorized or parallelized?: Yes, can utilize 1 CPU core per compatible GPU.RAM: Several Gb, depending on input size and number of GPUs. 4000000000 bytes (4 GB) per GPU is recommended.Classification: 4.13, 6.5, 15.External routines: CUDA, Boost Program Options, OpenMPINature of problem:The algorithm described by D. Albach in [1, 2] uses ray-tracing techniques and Monte Carlo integration to calculate Amplified Spontaneous Emission (ASE) with high precision. It requires a high number of sampling points as well as a high number of rays to reach the desired results. Additionally, reflections on the upper and lower surfaces of the medium increase the workload by an order of magnitude. On traditional CPU-based systems the computation is time-consuming, which limits the number of simulations that can be performed.Solution method:HASEonGPU uses a non-uniform distribution of sampling points within the gain medium to focus computation on areas of interest. This is further improved by combining the Monte Carlo integration with importance sampling [3]. To improve execution time further, the algorithm is highly parallelized to run on a GPU and supports adaptive sampling resolutions and random restarts. It can also be executed in a GPU cluster, where linear scaling is achieved by a coarse-granular load balancing that distributes the workload among all GPUs in a master–worker-scheme over MPI.Restrictions:Presently, the number of rays used for the Monte Carlo integration of a single sampling point within the gain medium is limited by the available memory on the GPU (about 108 rays per GB of GPU memory). Furthermore, when using MPI as a workload distribution mechanism, one of the MPI processes will act as a scheduling master and its GPU cannot participate in the computation.Unusual features:The software can run on a workstation (threaded) as well as on a large-scale GPU cluster (MPI) that provides the required GPU hardware. The simulation parameters include polychromatic laser pulses as well as surface coatings, cladding, and refractive indices of the gain medium. This also allows the simulation of reflections on the upper and lower surfaces of the medium. If a desired mean square error metric is not met with a set number of rays, the algorithm can automatically increase the number of rays to improve the results.Additional comments:The source code also includes a MATLAB script that can be used to call HASEonGPU directly from MATLAB code to integrate it into existing simulation setups. There are also examples included on how to execute HASEonGPU from the command line as well as an example experiment that uses MATLAB and the provided script. More detailed information can be found in the README file.Running time:Depending on the number of sampling points, desired sampling resolution for each point, and number of GPUs, the execution time can vary strongly. A typical cylindrical gain medium of 6 cm diameter simulated with 4210 non-uniformly distributed sampling points can be simulated with a sufficient precision in 1 min on a single NVIDIA Tesla K20m GPU. Running time as well as precision can be further optimized through various parameters.
Read full abstract