Abstract

Point-based global illumination (PBGI) is a popular rendering method in visual special effects and motion picture productions. This rendering algorithm models the 3D scene as a dense point cloud, which acts as caching records for light transport simulation. Structured in a tree, this cache supports the image synthesis stage through massive adaptive tree cut searches, together with the projection of these cuts on the many receiver shading points i.e., unprojected pixels in 3D space, where visibility is solved with receiver-specific z-buffer. These two operations are both time consuming in this algorithm, but they can be formulated for efficient parallel execution, in particular regarding wide-SIMD hardware. During the PBGI tree traversal procedure, we introduce a single-receiver traversal scheme for incoherent receivers, a packet traversal scheme for coherent receivers, as well as logic for dynamically switching between these methods at run-time. During the per-receiver rasterization procedure, we propose three different vectorization strategies for near-, mid- and far-distance points separately. We conducted experiments on an Intel Many Integrated Core (MIC) architecture and report results on several scenes, showing that up to a 9 × speedup can be achieved when compared with non-vectorized execution during the traversal step, and nearly 2.5 × speedup during rasterization step without quality degradation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call