Just-In-Time Compilation of NumPy Vector Operations

Johannes Lund,Brian Vinter,Simon A F Lund,Mads R B Kristensen

doi:10.7603/s40601-013-0021-1

Abstract

AbstractIn this paper, we introduce JIT compilation for the high-productivity framework Python/NumPy in order to boost the performance significantly. The JIT compilation of Python/NumPy is completely transparent to the user – the runtime system will automatically JIT compile and execute the NumPy instructions encountered in a Python application. In other words, we introduce a framework that provides the high-productivity from Python while maintaining the high-performance of a low-level, compiled language.We transforms NumPy vector instruction into an Abstract Syntax Tree representation that creates the basis for further optimizations. From the AST we auto-generate C code which we compile into computational kernels and execute. These incorporate temporary array removal and loop-fusion which are main benefactors in the achieved speedups. In order to amortize the overhead of creation, we also implement a cache for the compiled kernels.We evaluate the JIT compilation by executing several scientific computing benchmarks on an AMD. Compared to NumPy, we achieve speedups of a factor 4.72 for a N-Body application and 7.51 for a Jacobi Stencil application executing on a single CPU core.

Highlights

Many scientific algorithms can be expressed by using vector operation and linear algebra
In order to improve the performance of Python/NumPy, we introduce a Just-In-Time (JIT) compiler backend for the NumPy library
We have implemented a JIT framework for Python/NumPy that allow NumPy instructions to be expressed in an abstract form using Abstract Syntax Tree’s. This has allowed for a set of optimizations to the computations of Numpy vector operations and enables further optimizations

Summary

Introduction

Many scientific algorithms can be expressed by using vector operation and linear algebra. These are expressed in specialized high-level languages such as the NumPy library for Python. Their performance is often significantly lower than when implemented and computed in a low-level language. Expressing the data and calculations efficiently in a lowlevel language such as C is far from being a trivial task. It requires an in-depth understanding to implement this efficiently on heterogeneous hardware architectures

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: GSTF Journal on Computing (JoC)	Publication Date: Dec 1, 2013
Citations: 6	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Just-In-Time Compilation of NumPy Vector Operations

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: GSTF Journal on Computing (JoC)

Lead the way for us

Similar Papers

Experiences with interpretation vs. translation in transmeta's code morphing software
Dean Deaver
-
Dean DeaverDean Deaver
07 Jun 2004
07 Jun 2004

Coordinating the use of GPU and CPU for improving performance of compute intensive applications
George Teodoro ... Olcay Sertel
-
George Teodoro, et. al.George Teodoro ... Olcay Sertel
01 Jan 2009
01 Jan 2009

Migrating legacy Fortran to Python while retaining Fortran-level performance through transpilation and type hints
...
-
, et. al. ...
13 Nov 2016
13 Nov 2016

Thread-Shared Software Code Caches
D Bruening ... T Garnett
-
D Bruening, et. al.D Bruening ... T Garnett
26 Mar 2006
26 Mar 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Just-In-Time Compilation of NumPy Vector Operations

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: GSTF Journal on Computing (JoC)