Abstract

Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9times 9 or larger mask sizes, even in bandwidth-constrained systems.

Highlights

  • Heterogeneous High-Performance Computing (HPC) systems are becoming increasingly popular for data processing

  • Our results suggest that Polymorphic Register Files (PRFs) with large number of lanes are more efficient when the mask size is large; – A comparison of the PRF throughput with the NVIDIA Tesla C2050 Graphics Processing Unit (GPU)

  • This article analyzed the impact of Polymorphic Register Files (PRFs) on state-of-theart dataflow computing systems

Read more

Summary

Introduction

Heterogeneous High-Performance Computing (HPC) systems are becoming increasingly popular for data processing. The PPE runs the operating system and the application’s control sections, while the SPEs are designed to excel in data-intensive computations, executing the most time-consuming parts of the applications Another approach is to combine General Purpose Processors (GPPs) with specialized accelerators, implemented as custom chips or on reconfigurable devices, like Field-Programmable Gate Arrays (FPGAs). The Maxeler MaxWorkstation [2] combines Intel x86 processors with multiple dataflow engines powered by, e.g., Xilinx Virtex-6 FPGA devices This system adopts the dataflow computational model and organizes the data into highly regular streams flowing through the functions implemented in hardware, obtaining efficient implementations for streaming applications [3,4]. We need to specify its position (i.e. the base), the shape (e.g., rectangle, row, column, main or secondary diagonal), its dimensions (i.e. horizontal and vertical length) and the data type (e.g., integer or floating-point, 8/16/32/64 bits)

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.