Abstract
Researchers face a daunting task to provide scientific visualization capabilities for exascale computing. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Multiple vendors create such accelerator processors, each with significantly different features and performance characteristics. To address these visualization needs across multiple platforms, we are embracing the use of data parallel primitives that encapsulate highly efficient parallel algorithms that can be used as building blocks for conglomerate visualization algorithms. We can achieve performance portability by optimizing this small set of data parallel primitives whose tuning conveys to the conglomerates. In this paper we provide an overview of how to use data parallel primitives to solve some of the most common problems in visualization algorithms. We then describe how we are using these fundamental approaches to build a new toolkit, VTK-m, that provides efficient visualization algorithms on multi and many-core architectures. We conclude by reviewing a comparison of a visualization algorithm written with data parallel primitives and separate versions hand written for different architectures to show comparable performance with data parallel primitives with far less development work.
Highlights
The basic architecture for high-performance computing platforms has remained homogeneous and consistent for over a decade, revolutionary changes are coming
A alarming feature of tab. 1 is the increase in concurrency of the system: up to 5 orders of magnitude. This comes from an increase in both the number of cores as well as the number of threads run per core. (Modern cores employ techniques like hyperthreading to run multiple threads per core to overcome latencies in the system.) We currently stand about halfway through the transition from petascale to exascale and we can observe this prediction coming to fruition through the use of accelerator or many-core processors
Portable data parallel primitive implementations should have close to the performance of a non-portable algorithm designed and optimized for a particular device
Summary
The basic architecture for high-performance computing platforms has remained homogeneous and consistent for over a decade, revolutionary changes are coming. Power constraints and physical limitations are impelling the use of new types of processors, heterogeneous architectures, and deeper memory and storage hierarchies. Such drastic changes propagate to the design of software that is run on these high-performance computers and how we use them. 1 is the increase in concurrency of the system: up to 5 orders of magnitude This comes from an increase in both the number of cores as well as the number of threads run per core. (Modern cores employ techniques like hyperthreading to run multiple threads per core to overcome latencies in the system.) We currently stand about halfway through the transition from petascale to exascale and we can observe this prediction coming to fruition through the use of accelerator or many-core processors. A key strategy has been the use of data parallel primitives, since the approach enables simplified algorithm development and helps to achieve portable performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.