Serial Simulations Research Articles

Modern graphics processing units (GPUs) are becoming a popular hardware substrate for spiking neural network simulations [1-3], due to their massive parallelism and impressive cost-to-speed ratio. However, verifying and interpreting the results of a GPU simulation can be difficult because the results are never exactly reproducible, unlike those from an equivalent serial simulation on a CPU; not only is the simulation subject to the usual rounding errors of floating-point arithmetic, but there are also elements of stochasticity due to the non-determinism of the thread scheduling mechanism on the hardware, alongside the non-associativity of floating-point addition and multiplication. Consider, for example, a typical postsynaptic integration step for the summation of incoming synapse currents, executed on a GPU. If there are multiple threads, which are each simulating an incoming synapse, there is no guarantee for the order in which each synapse thread's current will be accumulated into the total current. Therefore, rounding errors will be different depending on this order and the result could be different every time the simulation runs. Such effects would initially be small but can be amplified in unstable or chaotic systems to a degree that the final results appear completely random across different runs (see Figure Figure11). Figure 1 In repeated runs, results of numerical simulations on GPUs can vary. Mean, standard deviation and range of observed membrane potential of a neuron in a network of 10,000 Izhikevich neurons, 8,000 excitatory and 2,000 inhibitory, with 1,000 random connections ... When comparing runs between GPU and CPU implementations there are additional sources of divergence. There are subtle differences in the way each architecture implements floating-point arithmetic. For instance, the NVIDIA C2070 GPU tested in this study implements the fused multiply-add (FMA) operation, introduced in the latest IEEE-754-2008 floating-point standard, whereas most Intel CPUs perform the multiplication and addition operations separately, with lower accuracy. Only Intel's most recent Haswell CPU architecture implements the more accurate FMA operation, whilst many current lab workstations contain chips that do not. The aim of the work presented here is to analytically determine the theoretical worst-case and average-case numerical absolute error incurred when simulating neural network models on an NVIDIA CUDA GPU. These error measurements are also compared with the absolute error resulting from the equivalent serial algorithm running on a single CPU core, using standard float32 (float) and float64 (double) precision floating-point arithmetic, to determine a reasonable error margin for verifying the results of parallel GPU simulations against those of equivalent serial CPU simulations. Furthermore, both CPU and GPU implementations are compared against an equivalent simulation using an accurate arbitrary-precision floating-point arithmetic library, to determine how far the CPU and GPU simulation trajectories deviate from the analytically 'correct' trajectory. For illustration, the divergence of a single neuron in a 10,000 neuron Izhikevich network is plotted in the figure. Finally, we also analyse the role of errors originating from approximate integration methods and compare them to the underlying numerical errors discussed thus far.

Read full abstract

► We quantify the performance of a new parallelization method for a distributed model. ► Parallel model performance is found to be dependent on the hydrologic variability. ► Load balancing methods improve parallel performance in particular for large domains. ► A wider range of distributed applications is now possible through parallelization. A major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.

Read full abstract

Serial Simulations Research Articles

Articles published on Serial Simulations

Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

Quality between mechanical compression on reducible stretcher versus manual compression on standard stretcher in small elevator

Estimating numerical error in neural network simulations on Graphics Processing Units

Simulations in generalized ensembles through noninstantaneous switches.

Comparison of serial and parallel simulations of a corridor fire using FDS

Optimizations of the energy grid search algorithm in continuous-energy Monte Carlo particle transport codes

Genealogical relationships between early medieval and modern inhabitants of Piedmont.

Interdisciplinary Disaster Drill Simulation: Laying the Groundwork for Further Research.

OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

A parallel computational framework to solve flow and transport in integrated surface–subsurface hydrologic systems

Climate change underlies global demographic, genetic, and cultural transitions in pre-Columbian southern Peru.

A hybrid artificial bee colony assisted differential evolution algorithm for optimal reactive power flow

Pre-whaling genetic diversity and population ecology in eastern Pacific gray whales: insights from ancient DNA and stable isotopes.

Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

Reordering and incomplete preconditioning in serial and parallel adaptive mesh refinement and coarsening flow solutions

Development and implementation of a polydispersed multiphase flow model in OpenFOAM

Serial tempering without exchange

Optimal Weights in Serial Generalized-Ensemble Simulations.

Interactions Between a Voltage Sensor and a Toxin via Multiscale Simulations

A Parallel Numerical Study of Transient Heat Transfer and Fluid Flow of Weld Pool during Laser Keyhole Welding

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Serial Simulations Research Articles

Articles published on Serial Simulations

Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

Quality between mechanical compression on reducible stretcher versus manual compression on standard stretcher in small elevator

Estimating numerical error in neural network simulations on Graphics Processing Units

Simulations in generalized ensembles through noninstantaneous switches.

Comparison of serial and parallel simulations of a corridor fire using FDS

Optimizations of the energy grid search algorithm in continuous-energy Monte Carlo particle transport codes

Genealogical relationships between early medieval and modern inhabitants of Piedmont.

Interdisciplinary Disaster Drill Simulation: Laying the Groundwork for Further Research.

OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

A parallel computational framework to solve flow and transport in integrated surface–subsurface hydrologic systems

Climate change underlies global demographic, genetic, and cultural transitions in pre-Columbian southern Peru.

A hybrid artificial bee colony assisted differential evolution algorithm for optimal reactive power flow

Pre-whaling genetic diversity and population ecology in eastern Pacific gray whales: insights from ancient DNA and stable isotopes.

Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

Reordering and incomplete preconditioning in serial and parallel adaptive mesh refinement and coarsening flow solutions

Development and implementation of a polydispersed multiphase flow model in OpenFOAM

Serial tempering without exchange

Optimal Weights in Serial Generalized-Ensemble Simulations.

Interactions Between a Voltage Sensor and a Toxin via Multiscale Simulations

A Parallel Numerical Study of Transient Heat Transfer and Fluid Flow of Weld Pool during Laser Keyhole Welding