We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo–Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors.Program summaryProgram title: STMC2LSU3MPICatalogue identifier: AEMJ_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.htmlProgram obtainable from: CPC Program Library, Queen’s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 26666No. of bytes in distributed program, including test data, etc.: 233126Distribution format: tar.gzProgramming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions.Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions.Operating system:1.Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0,2.Centos 5.3 with OpenMPI + gfortran 4.1.2,3.Cray XT4 with MPICH2 + pgf90 11.2-0.Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664RAM: 200 Mega bytes per process.Classification: 11.5.Nature of problem:Physics of pure SU(3) Quantum Field Theory (QFT). This is relevant for our understanding of Quantum Chromodynamics (QCD). It includes the glueball spectrum, topological properties and the deconfining phase transition of pure SU(3) QFT. For instance, Relativistic Heavy Ion Collision (RHIC) experiments at the Brookhaven National Laboratory provide evidence that quarks confined in hadrons undergo at high enough temperature and pressure a transition into a Quark-Gluon Plasma (QGP). Investigations of its thermodynamics in pure SU(3) QFT are of interest.Solution method:Markov Chain Monte Carlo (MCMC) simulations of SU(3) Lattice Gauge Theory (LGT) with the Wilson action. This is a regularization of pure SU(3) QFT on a hypercubic lattice, which allows approaching the continuum SU(3) QFT by means of Finite Size Scaling (FSS) studies. Specifically, we provide updating routines for the Cabibbo-Marinari heatbath with and without checkerboard parallelization. While the first is suitable for pedagogical purposes and small scale projects, the latter allows for efficient parallel processing. Targetting the geometry of RHIC experiments, we have implemented a Double-Layered Torus (DLT) lattice geometry, which has previously not been used in LGT MCMC simulations and enables inside and outside layers at distinct temperatures, the lower-temperature layer acting as the outside boundary for the higher-temperature layer, where the deconfinement transition goes on.Restrictions: The checkerboard partition of the lattice makes the development of measurement programs more tedious than is the case for an unpartitioned lattice. Presently, only one measurement routine for Polyakov loops is provided.Unusual features:We provide three different versions for the send/receive function of the MPI library, which work for different operating system +compiler +MPI combinations. This involves activating the correct row in the last three rows of our latmpi.par parameter file. The underlying reason is distinct buffer conventions.Running time:For a typical run using an Intel i7 processor, it takes (1.8-6) E-06 seconds to update one link of the lattice, depending on the compiler used. For example, if we do a simulation on a small (4 * 83) DLT lattice with a statistics of 221 sweeps (i.e., update the two lattice layers of 4 * (4 * 83) links each 221 times), the total CPU time needed can be2 * 4 * (4 * 83) * 221 * 3 E-06 seconds = 1.7 minutes,where2— two layers of lattice4— four dimensions83 * 4— lattice size221— sweeps of updating6 E-06 s— average time to update one link variable. If we divide the job into 8 parallel processes, then the real time is (for negligible communication overhead) 1.7 mins / 8 = 0.2 mins.