Abstract

Unified Shader Array is the computation core of the Unified Shader Array GPU, and the shader cores are the basic shader unit. In order to support the large-scale and thread-level parallelism of GPU, a large number of streaming processors composed of shader cores are set in GPU. The streaming processors can enable GPU to implement thread-level parallelism in the way of SIMD or SIMT. At the same time, GPU deploys large-scale register files for the streaming processor to reduce the cost of context switching, so the management of register files is of great significance to GPU performance. In the traditional GPU, each shader core has a unique register file, which simplifies the hardware management of the register. However, it will cause the limitation of the shader core’s scale and insufficient register utilization. Based on the domestic special GPU, this paper analyzes the traditional GPU register scheduling strategy and carries out the design of the unified shader array shared register file based on pure hardware to reduce the cost of shader cores. This paper introduces its implementation in detail, including dynamic allocation and collection management of register files, allocation of multiple registers for thread bundles, detecting and handling of bank conflicts, and related register allocation and release collection instructions. Finally, we complete the RTL code implementation and analyze the experimental results. The results show that the design of this paper has reached the expected design goals in terms of simulation and logic synthesis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call