The Case for Polymorphic Registers in Dataflow Computing

Cătălin Bogdan Ciobanu,Donatella Sciuto,Georgi Gaydadjiev,Christian Pilato

doi:10.1007/s10766-017-0494-1

Abstract

Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9times 9 or larger mask sizes, even in bandwidth-constrained systems.

Highlights

Heterogeneous High-Performance Computing (HPC) systems are becoming increasingly popular for data processing
Our results suggest that Polymorphic Register Files (PRFs) with large number of lanes are more efficient when the mask size is large; – A comparison of the PRF throughput with the NVIDIA Tesla C2050 Graphics Processing Unit (GPU)
This article analyzed the impact of Polymorphic Register Files (PRFs) on state-of-theart dataflow computing systems

Summary

Introduction

Heterogeneous High-Performance Computing (HPC) systems are becoming increasingly popular for data processing. The PPE runs the operating system and the application’s control sections, while the SPEs are designed to excel in data-intensive computations, executing the most time-consuming parts of the applications Another approach is to combine General Purpose Processors (GPPs) with specialized accelerators, implemented as custom chips or on reconfigurable devices, like Field-Programmable Gate Arrays (FPGAs). The Maxeler MaxWorkstation [2] combines Intel x86 processors with multiple dataflow engines powered by, e.g., Xilinx Virtex-6 FPGA devices This system adopts the dataflow computational model and organizes the data into highly regular streams flowing through the functions implemented in hardware, obtaining efficient implementations for streaming applications [3,4]. We need to specify its position (i.e. the base), the shape (e.g., rectangle, row, column, main or secondary diagonal), its dimensions (i.e. horizontal and vertical length) and the data type (e.g., integer or floating-point, 8/16/32/64 bits)

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Parallel Programming	Publication Date: May 10, 2017
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

The Case for Polymorphic Registers in Dataflow Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming

Lead the way for us

Similar Papers

Dataflow computing with Polymorphic Registers
Catalin Ciobanu ... Christian Pilato
-
Catalin Ciobanu, et. al.Catalin Ciobanu ... Christian Pilato
01 Jul 2013
01 Jul 2013

Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads
Russell Clapp ... Vish Viswanathan
-
Russell Clapp, et. al.Russell Clapp ... Vish Viswanathan
01 Oct 2015
01 Oct 2015

Efficient separable convolution using field programmable gate arrays
Arjun Kumar Joginipelly ... Dimitrios Charalampidis
Microprocessors and Microsystems | VOL. 71
Arjun Kumar Joginipelly, et. al.Arjun Kumar Joginipelly ... Dimitrios Charalampidis
30 Jul 2019
Microprocessors and Microsystems | VOL. 71

Authentication of Sub-NUMA Clustering effect on Intel Skylake for Memory Latency and Bandwidth
Srikanta Kumar Mohapatra, Et Al
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12
Srikanta Kumar Mohapatra, Et AlSrikanta Kumar Mohapatra, Et Al
10 May 2021
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Case for Polymorphic Registers in Dataflow Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming