Abstract

Parallel volume rendering is implemented and tested on an IBM Blue Gene distributed-memory parallel architecture. The goal of studying the cost of parallel rendering on a new class of supercomputers such as the Blue Gene/P is not necessarily to achieve real-time rendering rates. It is to identify and understand the extent of bottlenecks and interactions between various components that affect the design of future visualization solutions on these machines, solutions that may offer alternatives to hardware-accelerated volume rendering, for example, when large volumes, large image sizes, and very high quality results are dictated by peta- and exascale data. As a step in that direction, this study presents data from experiments under a number of conditions, including dataset size, number of processors, low- and high-quality rendering, offline storage of results, and streaming of images for remote display. Performance is divided into three main sections of the algorithm: disk I/O, rendering, and compositing. The dynamic balance among these tasks varies with the number of processors and other conditions. Lessons learned from the work include understanding the balance between parallel I/O, computation, and communication within the context of visualization on supercomputers; recommendations for tuning and optimization; and opportunities for further scaling. Extrapolating these results to very large data and image sizes suggests that a distributed-memory high-performance computing architecture such as the Blue Gene is a viable platform for some types of visualization at very large scales.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call