Visualization for Exascale: Portable Performance is Critical

Kenneth D Moreland ,Matthew C Larsen ,Hank Childs

doi:10.14529/jsfi150306

Kenneth D Moreland , Matthew C Larsen + Show 1 more

Open Access

https://doi.org/10.14529/jsfi150306

Copy DOI

Abstract

Researchers face a daunting task to provide scientific visualization capabilities for exascale computing. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Multiple vendors create such accelerator processors, each with significantly different features and performance characteristics. To address these visualization needs across multiple platforms, we are embracing the use of data parallel primitives that encapsulate highly efficient parallel algorithms that can be used as building blocks for conglomerate visualization algorithms. We can achieve performance portability by optimizing this small set of data parallel primitives whose tuning conveys to the conglomerates. In this paper we provide an overview of how to use data parallel primitives to solve some of the most common problems in visualization algorithms. We then describe how we are using these fundamental approaches to build a new toolkit, VTK-m, that provides efficient visualization algorithms on multi and many-core architectures. We conclude by reviewing a comparison of a visualization algorithm written with data parallel primitives and separate versions hand written for different architectures to show comparable performance with data parallel primitives with far less development work.

Highlights

The basic architecture for high-performance computing platforms has remained homogeneous and consistent for over a decade, revolutionary changes are coming
A alarming feature of tab. 1 is the increase in concurrency of the system: up to 5 orders of magnitude. This comes from an increase in both the number of cores as well as the number of threads run per core. (Modern cores employ techniques like hyperthreading to run multiple threads per core to overcome latencies in the system.) We currently stand about halfway through the transition from petascale to exascale and we can observe this prediction coming to fruition through the use of accelerator or many-core processors
Portable data parallel primitive implementations should have close to the performance of a non-portable algorithm designed and optimized for a particular device

Summary

Introduction

The basic architecture for high-performance computing platforms has remained homogeneous and consistent for over a decade, revolutionary changes are coming. Power constraints and physical limitations are impelling the use of new types of processors, heterogeneous architectures, and deeper memory and storage hierarchies. Such drastic changes propagate to the design of software that is run on these high-performance computers and how we use them. 1 is the increase in concurrency of the system: up to 5 orders of magnitude This comes from an increase in both the number of cores as well as the number of threads run per core. (Modern cores employ techniques like hyperthreading to run multiple threads per core to overcome latencies in the system.) We currently stand about halfway through the transition from petascale to exascale and we can observe this prediction coming to fruition through the use of accelerator or many-core processors. A key strategy has been the use of data parallel primitives, since the approach enables simplified algorithm development and helps to achieve portable performance

Data Parallel Primitives

Patterns for Data Parallel Visualization

Stream Compaction

Reverse Index Lookup

Topology Consolidation

Building a Framework

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Supercomputing Frontiers and Innovations	Publication Date: Sep 1, 2015
Citations: 16	License type: cc-by

R Discovery Prime

R Discovery Prime

Visualization for Exascale: Portable Performance is Critical

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations

Lead the way for us

Similar Papers

Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives
Talita Perciano ... Colleen Heinemann
-
Talita Perciano, et. al.Talita Perciano ... Colleen Heinemann
01 Jan 2020
01 Jan 2020

HashFight: A Platform-Portable Hash Table for Multi-Core and Many-Core Architectures
Brenton Lessley ... Shaomeng Li
Electronic Imaging | VOL. 32
Brenton Lessley, et. al.Brenton Lessley ... Shaomeng Li
26 Jan 2020
Electronic Imaging | VOL. 32

Volume rendering via data-parallel primitives
...
-
, et. al. ...
25 May 2015
25 May 2015

Evaluation of Performance Portability of Applications and Mini-Apps across AMD, Intel and NVIDIA GPUs
Jaehyuk Kwack ... Colleen Bertoni
-
Jaehyuk Kwack, et. al.Jaehyuk Kwack ... Colleen Bertoni
01 Nov 2021
01 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visualization for Exascale: Portable Performance is Critical

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations