Common Unified Device Architecture Research Articles

A major challenge in computational neuroscience is to achieve high performance for real-time simulations of full size brain networks. Recent advances in GPU technology provide massively parallel, low-cost and efficient hardware that is widely available on the computer market. However, the comparatively low-level programming that is necessary to create an efficient GPU-compatible implementation of neuronal network simulations can be challenging, even for otherwise experienced programmers. To resolve this problem a number of tools for simulating spiking neural networks (SNN) on GPUs have been developed [1,2], but using a particular simulator usually comes with restrictions to particular supported neuron models, synapse models or connectivity schemes. Besides being inconvenient, this can unduly influence the path of scientific enquiry. Here we present GeNN (GPU enhance neuronal networks), which builds on NVIDIA's common unified device architecture (CUDA) to enable a more flexible framework. CUDA allows programmers to write C-like code and execute it on NVIDIA’s massively parallel GPUs. However, in order to achieve good performance, it is critical but not trivial to make the right choices on how to parallelize a computational problem, organize its data in memory and optimize the memory access patterns. GeNN is based on the idea that much of this optimization can be cast into heuristics that allow the GeNN meta-compiler to generate optimized GPU code from a basic description of the neuronal network model in a minimal domain specific language of C function calls. For further simplification, this description may also be obtained by translating variables, dynamical equations and parameters from an external simulator into GeNN input files. We are developing this approach for the Brian 2 [3] and SpineCreator/SpineML [4] systems. Using a code generation approach in GeNN has important advantages: 1. A large number of different neuron and synapse models can be provided without performance losses in the final simulation code. 2. The generated simulator code can be optimized for the available GPU hardware and for the specific model. 3. The framework is intrinsically extensible: New GPU optimization strategies, including strategies of other simulators, can be added in the generated code for situations where they are effective. The first release version of GeNN is available at http://sourceforge.net/projects/genn. It has been built and optimized for simulating neuronal networks with an anatomical structure (separate neuron populations with sparse or dense connection patterns with the possibility to use some common learning rules). We have executed performance and scalability tests on an NVIDIA Tesla C2070 GPU with an Intel Xeon(R) E5-2609 CPU running Ubuntu 12.04 LTS. Our results show that as the network size increases, GPU simulations never fail to outperform CPU simulations. But we are also able to demonstrate the performance limits of using GPUs with GeNN under different scenarios of network connectivity, learning rules and simulation parameters, confirming the that GPU acceleration can differ largely depending on the particular details of the model of interest.

Simulating large scale computer models of brain structures with spiking neuronal networks has become increasingly popular and feasible with the advent of general purpose computing on graphical processing units (GPGPU). Modern graphics cards, such as the NVidia® range supporting the common unified device architecture (CUDA™) provide massively parallel computing architectures for this purpose. Earlier GPU implementations of neural networks, including my own earlier work [1,2], were customized for specific models, and optimized and tested with specific hardware. Recently, more general spiking neuronal network simulators have been developed [3,4] that allow the definition of the network connectivity and neuron- and synapse parameters at runtime. However, the simulators are still quite specific in using a single neuron model (typically Izhikevich neurons), synapse model (typically stateless synapses with delay) and have been tested for a typical model type and on specific GPU hardware. In this work I present a framework of semi-automated code generation for simulating neuronal networks on GPU hardware. Using code generation to build a specific simulator engine for each individual network model has important advantages: (i) The simulator system can provide a large choice of different neuron and synapse models for use in simulations without creating any overheads or performance losses in the actual simulation code. It also allows the inclusion of user-defined models without the necessity for a user to understand the GPU code. (ii) The generated simulator software can be optimized for the available GPU hardware and for the structure of the specific model. (iii) The framework is intrinsically extensible: New GPU optimization strategies can be added and strategies of existing simulators can be included in the generated code for situations where they are effective. An embryonic beta version of such a framework has been built and optimized for simulating neuronal networks with an anatomical structure (separate neuron populations that are densely connected), building on our earlier work on neuronal network models [1,2] of the olfactory system of insects [5,6]. The prototype framework consists of a C++ source library that generates CUDA kernels and runtime code according to a user-specified neuronal network model. Unlike existing simulators [5,6], it exploits that most parameters of neuronal network simulations are known at compile time and do not change during a simulation. These parameters are hard-coded into the CUDA kernels saving valuable register and shared memory space. In practice, the system is used by defining a neuronal network in a C++ class and compiling and executing the code generation software. The generated C++/CUDA code is compiled with user-side simulation code into a lean stand-alone executable. I have tested the system on an NVidia Quadro FX 5800 device hosted in a PC with Intel Xeon quad core CPU with 8MB cache and 12 GB RAM. I have observed variable GPU versus CPU speedups between none and 76x. The peak observed spike delivery per second was 2.4 billion and the largest simulated network had about 1 million neurons and 635 million synapses (limited by the 4GB device memory on the FX 5800). In the future I will extend the system with additional model elements, optimizations and improved API for use by the CNS community.

Common Unified Device Architecture Research Articles

Related Topics

Articles published on Common Unified Device Architecture

High-accuracy and video-rate lifetime extraction from time correlated single photon counting data on a graphical processing unit

More flexibility for code generation with GeNN v2.1

Simulating spiking neural networks on massively parallel graphical processing units using a code generation approach with GeNN

실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계

Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

Multi-GPU-based Swendsen–Wang multi-cluster algorithm for the simulation of two-dimensional [formula omitted]-state Potts model

Hybrid general-purpose computation on GPU (GPGPU) and computer graphics synthetic aperture radar simulation for complex scenes

GPU-based Swendsen–Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems

GPU-based single-cluster algorithm for the simulation of the Ising model

Flexible neuronal network simulation framework using code generation for NVidia® CUDA™

Iterative Reconstruction for Transmission Tomography on GPU Using Nvidia CUDA

FAST RECONSTRUCTION METHOD BASED ON COMMON UNIFIED DEVICE ARCHITECTURE (CUDA) FOR MICRO-CT

Recent Development of Molecular Simulation Based on GPU in Material Science

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Common Unified Device Architecture Research Articles

Related Topics

Articles published on Common Unified Device Architecture

High-accuracy and video-rate lifetime extraction from time correlated single photon counting data on a graphical processing unit

More flexibility for code generation with GeNN v2.1

Simulating spiking neural networks on massively parallel graphical processing units using a code generation approach with GeNN

실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계

Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

Multi-GPU-based Swendsen–Wang multi-cluster algorithm for the simulation of two-dimensional [formula omitted]-state Potts model

Hybrid general-purpose computation on GPU (GPGPU) and computer graphics synthetic aperture radar simulation for complex scenes

GPU-based Swendsen–Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems

GPU-based single-cluster algorithm for the simulation of the Ising model

Flexible neuronal network simulation framework using code generation for NVidia® CUDA™

Iterative Reconstruction for Transmission Tomography on GPU Using Nvidia CUDA

FAST RECONSTRUCTION METHOD BASED ON COMMON UNIFIED DEVICE ARCHITECTURE (CUDA) FOR MICRO-CT

Recent Development of Molecular Simulation Based on GPU in Material Science