Rubus: A compiler for seamless and extensible parallelism

Muhammad Adnan,Faisal Aslam,Syed Mansoor Sarwar,Zubair Nawaz

doi:10.1371/journal.pone.0188721

Muhammad Adnan, Faisal Aslam + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0188721

Copy DOI

Abstract

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.

Highlights

Since the beginning of computing, computer users desire to have higher computing speed to perform tasks previously considered near impossible
In case an analyzer establishes that a loop can be parallelized, it is checked for trivialization. If it is a trivial loop, it is imported into the phase where live variable analysis is applied on it to find out the arguments for kernel and kernel is generated in Open Computing Language (OpenCL) based on the body of the loop
This paper presents the design and implementation of a new compiler Rubus for seamless parallelism

Summary

Introduction

Since the beginning of computing, computer users desire to have higher computing speed to perform tasks previously considered near impossible. Traditional programming languages, which were designed to deal with single core machines, cannot fully utilize the multicore CPUs and GPUs efficiently To this end, a few low-level languages including Compute Unified Device Architecture (CUDA) [3] and Open Computing Language (OpenCL) have been introduced to exploit the parallelism capabilities of the underlying hardware. The main shortcoming of these languages is that the programmer needs to specify all the complex details about how to distribute the code on multiple cores for parallel execution. Rubus relieves the burden of learning new languages, rewriting the code and specifying low-level details needed to parallelize the code It aims to provide seamless data level parallelism by exploiting the massive computational power of GPUs and multi-core CPUs, without writing any extra code.

Methodology

Reading bytecode

Basic block

Deriving control flow graph

Loop detection

Natural loop and loop nesting

Finding trivial loop

Dependency analysis

Header node for outer loop

Loop extraction

Live variable analysis

2.10 Kernel generation in OpenCL

2.11 Kernel launcher generation

2.12 Kernel merging

Performance evaluation

Experiments setup and hardware specifications

Matrix multiplication

Convolution

Mandelbrot set

N-body simulation

Squares

Real-time movie convolution

Related work

Java tools for parallelism

Non-Java tools for parallelism

Conclusion

Findings

Limitations and future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rubus: A compiler for seamless and extensible parallelism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Journal: PLOS ONE	Publication Date: Dec 6, 2017
License type: CC BY 4.0

Similar Papers

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores
Satish Chikkagoudar ... Mingyao Li
BMC Research Notes | VOL. 4
Satish Chikkagoudar, et. al.Satish Chikkagoudar ... Mingyao Li
26 May 2011
BMC Research Notes | VOL. 4

Toward GPGPU accelerated human electromechanical cardiac simulations
Guillermo Vigueras ... David Nordsletten
International Journal for Numerical Methods in Biomedical Engineering | VOL. 30
Guillermo Vigueras, et. al.Guillermo Vigueras ... David Nordsletten
20 Sep 2013
International Journal for Numerical Methods in Biomedical Engineering | VOL. 30

General Purpose Computation on Graphics Processing Units Using OpenCL

-

01 Jan 2013
01 Jan 2013

Synergistic execution of stream programs on multicores with accelerators
Abhishek Udupa ... Matthew J Thazhuthaveetil
-
Abhishek Udupa, et. al.Abhishek Udupa ... Matthew J Thazhuthaveetil
19 Jun 2009
19 Jun 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rubus: A compiler for seamless and extensible parallelism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE