A parallel image-rendering algorithm and architecture based on ray tracing and radiosity shading

Li-Sheng Shen,Ed F Deprettere,P Dewilde

doi:10.1016/0097-8493(94)00154-q

Li-Sheng Shen, Ed F Deprettere + Show 1 more

Open Access

https://doi.org/10.1016/0097-8493(94)00154-q

Copy DOI

Abstract

In this paper, we explore ways to improving the performance of a ray-casting based approach for visualizing artificial scenes with photo realism on the screen of a workstation. We aim at developing a parallel image-rendering algorithm and architecture based on the so-called two-pass approach [1–4]. This approach is normally demanding orders-of-magnitude more processing power for a single processor if we wish to make a state-of-art image in real-time or even interactive time. Several attempts have been made to boost processing power by using parallel architectures, but they still suffer from high overheads due to latency and synchronization. In this paper, we argue that large speedups and low overheads can only be attained through combined algorithm and architecture design. By attempting this combined effort, we come up with a good algorithm-architecture pair, namely, the shelling technique[5–7] and a pipelined parallel architecture[8], in which a parameterized space partitioning on the one hand finds its counterpart in a scalable network of clusters on the other hand. The target system which is made of a host computer and a scalable network of clusters has been completely modelled by using a mixed-level simulator called the Block Oriented Network Simulator (BONeS ®), and we have evaluated its performance for a set of practical scenes. Promising results have been observed, including the following: 1. 1. The performance of the shelling technique is a weak function of the scene complexity. The computational complexity of the shelling technique is k × R ( k is about 2–5) as compared to N × R ( N is the total number of patches) of the naive algorithm, where R is the total number of intersection-computation rays. 2. 2. A reasonable speedup has been observed up to 8 clusters. The limiting factors in speedup are workload imbalancing, the long latencies for global memory requests and the limited bandwidth provided by the system and local buses. To achieve a higher scalability of the system, further improvement in the front-end system together with the use of a dynamic workload balancing scheme would be necessary. 3. 3. The performance of software intersection computation on HP720 is about 0.2 M/sec. The radiosity engine provides two-orders-of-magnitude more processing power than this software approach per cluster.

Full Text