Abstract

As various applied sensors have been integrated into embedded devices, the Embedded Graphics Processing Unit (EGPU) has assumed more processing tasks, which requires an EGPU with higher performance. A tile-based EGPU is proposed that can be used in both general-purpose computing and 3D graphics rendering. With fused, scalable, and hierarchical parallelism architecture, the EGPU has the ability to address nearly 100 million vertices or fragments and achieves 1 GFLOPS per second at a clock frequency of 200 MHz. A fused and scalable architecture, constituted by Universal Processing Engine (UPE) and Graphics Coprocessor Cluster (GCC), ensures that the EGPU can adapt to various graphic processing scenes and situations, achieving more efficient rendering. Moreover, hierarchical parallelism is implemented via the UPE. Additionally, tiling brings a significant reduction in both system memory bandwidth and power consumption. A 0.18 µm technology library is used for timing and power analysis. The area of the proposed EGPU is 6.5 mm∗6.5 mm, and its power consumption is approximately 349.318 mW. Experimental results demonstrate that the proposed EGPU can be used in a System on Chip (SoC) configuration connected to sensors to accelerate its processing and create a proper balance between performance and cost.

Highlights

  • With the development of embedded applications, various embedded platforms and devices have become an essential part of people’s daily lives [1]

  • After the Universal Processing Engine (UPE) completes the processing of vertex shading programs, the result data are written to the on-chip internal buffer and are further processed by the Primitive Assembler (PA)/clip/viewport/setup/raster unit under the command processor (CP)’s control until realization of the final pixel fragments

  • A system verification platform is established to verify the performance of the Embedded Graphics Processing Unit (EGPU) design proposed in this paper

Read more

Summary

Introduction

With the development of embedded applications, various embedded platforms and devices have become an essential part of people’s daily lives [1]. Compared with a desktop GPU, an EGPU requires equivalent processing performance, reduced energy consumption, better portable APIs, low cost, and more efficient use of memory bandwidth. These critical factors relate to each other dependently and tightly, which determines the optimization strategies in EGPU hardware design. All the on-chip processing is performed at high depth and pixel accuracy at the full clock rate without external memory access latency This approach greatly saves memory bandwidth and, enables modern games and other graphics applications to run with optimized performance [9].

Tile-Based EGPU
Experimental Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.