Single-core Processors Research Articles

Purpose:Assessing the performance and uncertainty of a pre‐calculated Monte Carlo (PMC) algorithm for proton and electron transport running on graphics processing units (GPU). While PMC methods have been described in the past, an explicit quantification of the latent uncertainty arising from recycling a limited number of tracks in the pre‐generated track bank is missing from the literature. With a proper uncertainty analysis, an optimal pre‐generated track bank size can be selected for a desired dose calculation uncertainty.Methods:Particle tracks were pre‐generated for electrons and protons using EGSnrc and GEANT4, respectively. The PMC algorithm for track transport was implemented on the CUDA programming framework. GPU‐PMC dose distributions were compared to benchmark dose distributions simulated using general‐purpose MC codes in the same conditions. A latent uncertainty analysis was performed by comparing GPUPMC dose values to a “ground truth” benchmark while varying the track bank size and primary particle histories.Results:GPU‐PMC dose distributions and benchmark doses were within 1% of each other in voxels with dose greater than 50% of Dmax. In proton calculations, a submillimeter distance‐to‐agreement error was observed at the Bragg Peak. Latent uncertainty followed a Poisson distribution with the number of tracks per energy (TPE) and a track bank of 20,000 TPE produced a latent uncertainty of approximately 1%. Efficiency analysis showed a 937× and 508× gain over a single processor core running DOSXYZnrc for 16 MeV electrons in water and bone, respectively.Conclusion:The GPU‐PMC method can calculate dose distributions for electrons and protons to a statistical uncertainty below 1%. The track bank size necessary to achieve an optimal efficiency can be tuned based on the desired uncertainty. Coupled with a model to calculate dose contributions from uncharged particles, GPU‐PMC is a candidate for inverse planning of modulated electron radiotherapy and scanned proton beams.This work was supported in part by FRSQ‐MSSS (Grant No. 22090), NSERC RG (Grant No. 432290) and CIHR MOP (Grant No. MOP‐211360).

Read full abstract

Processor architectures has taken a turn toward many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multicore and single-core processors due to the need for writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling, increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy. Task dataflow programming models have a high potential to simplify parallel programming because they alleviate the programmer from identifying precisely all intertask dependences when writing programs. Instead, the task dataflow runtime system detects and enforces intertask dependences during execution based on the description of memory accessed by each task. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel, taking into account dependences specified in the task graph. Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this article, we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output, and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs , and lists . We also consider a fourth edgeless scheme that synchronizes tasks using integers. Analysis using microbenchmarks shows that the graph representation is not always scalable and that the edgeless scheme introduces least overhead in nearly all situations.

Read full abstract

Single-core Processors Research Articles

Articles published on Single-core Processors

High Level Model of Time Predictable Multitask Control Unit

Massive Image Treatment System Based on Cloud Computing Platform

Timing Verification of Fault-Tolerant Chips for Safety-Critical Applications in Harsh Environments

Parallelization Strategies for the GPS Radio Occultation Data Assimilation with a Nonlocal Operator in the Weather Research and Forecasting Model

The Status and Challenges of Multi-Processor System-on-Chip’s Formal Verification

RCSoS: An IEC 61508 Compatible Server Model for Reliable Communication

Comparative Study of Parallel Programming Models to Compute Complex Algorithm

Cholesky-decomposed density MP2 with density fitting: accurate MP2 and double-hybrid DFT energies for large systems.

Emulation and Analytical Model of PIM Supplemental Computing Element via HDL

TH‐A‐19A‐04: Latent Uncertainties and Performance of a GPU‐Implemented Pre‐Calculated Track Monte Carlo Method

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique

Design of Multi-core Processor Software with Pipelining Strategy

Pre-treatment radiotherapy dose verification using Monte Carlo doselet modulation in a spherical phantom

An Architecture for Measuring Network Performance in Multi-Core Multi-Cluster Architecture (MCMCA)

LaBonte's method revisited: An effective steepest descent method for micromagnetic energy minimization

Parallel Implementation of Color Based Image Retrieval Using CUDA on the GPU

Analysis of dependence tracking algorithms for task dataflow execution

Analysis of dependence tracking algorithms for task dataflow execution

A DSP-Based HEVC decoder implementation using an actor language dataflow model

Methods to explore design space for MPEG RMC codec specifications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Single-core Processors Research Articles

Articles published on Single-core Processors

High Level Model of Time Predictable Multitask Control Unit

Massive Image Treatment System Based on Cloud Computing Platform

Timing Verification of Fault-Tolerant Chips for Safety-Critical Applications in Harsh Environments

Parallelization Strategies for the GPS Radio Occultation Data Assimilation with a Nonlocal Operator in the Weather Research and Forecasting Model

The Status and Challenges of Multi-Processor System-on-Chip’s Formal Verification

RCSoS: An IEC 61508 Compatible Server Model for Reliable Communication

Comparative Study of Parallel Programming Models to Compute Complex Algorithm

Cholesky-decomposed density MP2 with density fitting: accurate MP2 and double-hybrid DFT energies for large systems.

Emulation and Analytical Model of PIM Supplemental Computing Element via HDL

TH‐A‐19A‐04: Latent Uncertainties and Performance of a GPU‐Implemented Pre‐Calculated Track Monte Carlo Method

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique

Design of Multi-core Processor Software with Pipelining Strategy

Pre-treatment radiotherapy dose verification using Monte Carlo doselet modulation in a spherical phantom

An Architecture for Measuring Network Performance in Multi-Core Multi-Cluster Architecture (MCMCA)

LaBonte's method revisited: An effective steepest descent method for micromagnetic energy minimization

Parallel Implementation of Color Based Image Retrieval Using CUDA on the GPU

Analysis of dependence tracking algorithms for task dataflow execution

Analysis of dependence tracking algorithms for task dataflow execution

A DSP-Based HEVC decoder implementation using an actor language dataflow model

Methods to explore design space for MPEG RMC codec specifications