Abstract

Exact order-independent transparency (OIT) techniques capture all fragments during rasterization. The fragments are then sorted per-pixel by depth and composited in order using alpha transparency. The sorting stage is a bottleneck for high depth complexity scenes, taking 70---95 % of the total time for those investigated. In this paper, we show that typical shader-based sorting speed is impacted by local memory latency and occupancy. We present and discuss the use of both registers and an external merge sort in register-based block sort to better use the memory hierarchy of the GPU for improved OIT rendering performance. This approach builds upon backwards memory allocation, achieving an OIT rendering speed up to 1.7 $$\times $$ × that of the best previous method and 6.3 $$\times $$ × that of the common straight forward OIT implementation. In some cases, the sorting stage is reduced to no longer be the dominant OIT component.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call