Database-Inspired Optimizations for Statistical Analysis

Hannes Mühleisen,Alexander Bertram,Maarten-Jan Kallen

doi:10.18637/jss.v087.i04

Abstract

Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. However, current tools such as GNU R were not built to efficiently handle large data sets. We propose to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone. This allows us to apply optimization techniques from the columnar data management research field. We have implemented several of these optimizers in Renjin, an open-source execution environment for R scripts targeted at the Java virtual machine. The demonstration of our approach using a series of micro-benchmarks and experiments on complex survey analysis show orders-of-magnitude improvements in analysis cost.

Highlights

Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge
We have proposed to view execution of statistical analysis programs from a different standpoint, where the analysis script is interpreted not in an imperative order, but as a declaration of intent, with the details of execution left to an execution engine
We have argued that optimization techniques developed for columnar relational databases are interesting candidates to optimize execution of statistical analysis programs

Summary

Introduction

Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. Database-Inspired Optimizations for Statistical Analysis optimizations common to relational data management systems such as automatic intra-query parallelization are more crucial for performance than ever due to the leveling off in processor clock speeds and the resulting necessary move to multi-core architectures and applications To date, such techniques have enjoyed limited application to more complex statistical analyses written in imperative languages such as R, C, Fortran, or Python. Code-centric view of the original GNU R interpreter, we propose to interpret R programs as a declarative, data-centric specification of intent, similar to relational queries This paradigm shift allows an execution engine to perform advanced reasoning for example on execution order or computational cost.

Deferred computation in Renjin

Column-oriented database optimization methods

Experiment methodology

Selection push-down

Expression result caching

Parallel execution

Identity removal

Survey analysis experiments

Related Work

Conclusions and outlook

Findings

Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of statistical software	Publication Date: Jan 1, 2018
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Database-Inspired Optimizations for Statistical Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software

Lead the way for us

Similar Papers

Extending Java virtual machine with integer‐reference conversion
Oiwa Yutaka ... Akinori Yonezawa
Concurrency: Practice and Experience | VOL. 12
Oiwa Yutaka, et. al.Oiwa Yutaka ... Akinori Yonezawa
01 May 2000
Concurrency: Practice and Experience | VOL. 12

JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs
Sungjae Hwang ... Jihoon Kim
-
Sungjae Hwang, et. al.Sungjae Hwang ... Jihoon Kim
01 May 2021
01 May 2021

JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs
Sungjae Hwang ... Jihoon Kim
-
Sungjae Hwang, et. al.Sungjae Hwang ... Jihoon Kim
01 May 2021
01 May 2021

Implementing fast JVM interpreters using Java itself
Michael Bebenita ... Michael Franz
-
Michael Bebenita, et. al.Michael Bebenita ... Michael Franz
01 Jan 2007
01 Jan 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Database-Inspired Optimizations for Statistical Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software