Fast sorting for exact OIT of complex scenes

Pyarelal Knowles,Fabio Zambetta,Geoff Leach

doi:10.1007/s00371-014-0956-z

Abstract

Exact order-independent transparency (OIT) techniques capture all fragments during rasterization. The fragments are then sorted per-pixel by depth and composited in order using alpha transparency. The sorting stage is a bottleneck for high depth complexity scenes, taking 70---95 % of the total time for those investigated. In this paper, we show that typical shader-based sorting speed is impacted by local memory latency and occupancy. We present and discuss the use of both registers and an external merge sort in register-based block sort to better use the memory hierarchy of the GPU for improved OIT rendering performance. This approach builds upon backwards memory allocation, achieving an OIT rendering speed up to 1.7 $$\times $$ × that of the best previous method and 6.3 $$\times $$ × that of the common straight forward OIT implementation. In some cases, the sorting stage is reduced to no longer be the dominant OIT component.

Full Text