Abstract

The simulation of the behavior of the Human Brain is one of the most important challenges in computing today. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulations need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Voltage on neurons’ morphology. This is carried out using the Hines Algorithm and, although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. Two different approaches are studied, one for mono-morphology simulations (batch of neurons with the same shape) and one for multi-morphology simulations (batch of neurons where every neuron has a different shape). In mono-morphology simulations we obtain a good performance using just a single kernel to compute all the neurons. However this turns out to be inefficient on multi-morphology simulations. Unlike the previous scenario, in multi-morphology simulations a much more complex implementation is necessary to obtain a good performance. In this case, we must execute more than one single GPU kernel. In every execution (kernel call) one specific part of the batch of the neurons is solved. These parts can be seen as multiple and independent tridiagonal systems. Although the present paper is focused on the simulation of the behavior of the Human Brain, some of these techniques, in particular those related to the solving of tridiagonal systems, can be also used for multiple oil and gas simulations. Our studies have proven that the optimizations proposed in the present work can achieve high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two NVIDIA K80 GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling, even when dealing with a very high number of neurons.

Highlights

  • We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems

  • Our studies have proven that the optimizations proposed in the present work can achieve high performance on those computations with a high number of neurons, being our GPU implementations about 4· and 8· faster than the OpenMP multicore implementation (16 cores), using one and two NVIDIA K80 GPUs respectively

  • In the rest of this section, we focus on the implementation of a kernel, which makes use of some of the ideas previously presented, but to solve tridiagonal systems instead of Hines systems

Read more

Summary

Motivation

We can find multiple initiatives that attempt to simulate the behavior of the Human Brain by computer [1,2,3]. Multiple works have explored the use of GPUs to compute multiple independent problems in parallel without transforming the data layout [14,15,16,17], the particular characteristics of the sparsity of the Hines matrices force us to modify the data layout to efficiently exploit the memory hierarchy of the GPUs (coalescing accesses to GPU memory). This work includes a complete new approach to deal with one of the most important challenges in the simulation of the Human Brain, that is, dealing with simulations which involve neurons with different morphologies (multi-morphology simulations) To deal with this particular scenario, we must compute parts of the batch of neurons separately. We describe the numerical framework behind the computation of the Voltage on neurons morphology The use of the Block-Interleaved data layout can take better advantage of the growing importance of the bigger and bigger cache memories in the memory hierarchy of the current and upcoming GPU architectures

Implementation based on Shared Memory
Performance analysis
Remarks
Tridiagonal linear systems
Implementation of cuThomasBatch
Findings
Performance analysis of multi-morphology Hines
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call