Data movement, particularly access to the main memory, has been the bottleneck of most computing problems. Ray tracing is no exception. We propose an unconventional solution that combines a ray ordering scheme that minimizes access to the scene data with a large on-chip buffer acting as near-compute storage that is spread over multiple chips. We demonstrate the effectiveness of our approach by introducing Mach-RT (Many chip - Ray Tracing), a new hardware architecture for accelerating ray tracing. Extending the concept of dual streaming, we optimize the main memory accesses to a level that allows the same memory system to service multiple processor chips at the same time. While a multiple chip solution might seem to imply increased energy consumption as well, because of the reduced memory traffic we are able to demonstrate, performance increases while maintaining reasonable energy usage compared to academic and commercial architectures. This article extends our previous work E. Vasiou, K. Shkurko, E. Brunvand, and C. Yuksel, "Mach-RT: A many chip architecture for high-performance ray tracing," in Proc. High-Perform. Graph. Conf., 2019 with design space exploration of the L3 cache size, more detailed evaluation of energy and memory performance, a discussion of energy delay product, and a brief exploration of boards with 16 chips. We also introduce new treelet enqueueing logic for the predictive scheduler.
Read full abstract