Abstract

It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.

Highlights

  • With multicore CPUs prevalent in even the most budget computers, clusters becoming more cost effective and realistic in smaller scenarios, and even cloud computing becoming more common, it is useful to have multiple different ways to parallelize ones code

  • We provide a means to make the batching more user friendly as well

  • We have presented the R package batch that allows a user to specifiy command line options to R script files

Read more

Summary

Introduction

With multicore CPUs (central processing units) prevalent in even the most budget computers, clusters becoming more cost effective and realistic in smaller scenarios, and even cloud computing becoming more common, it is useful to have multiple different ways to parallelize ones code. Parameters can be passed in as numerical values, strings, or even vectors of values By using these command line arguments, an alternative and intuitive method of implementing parallelism into your R code is to run the same R script multiple times. Rather than running scripts in parallel by wrapping them in a function to be applied across a list (as in multicore or snowfall), one runs the script directly with different command-line values This can be more intuitive, for example, in situations dealing with large datasets where the dataset does not fit in memory, and things must be explicitly chopped into pieces anyway. The R package batch provides a means to pass parameter values into scripts and run them in parallel on a cluster or locally on any operating system. The package is available from CRAN at http://CRAN.Rproject.org/package=batch

Passing in command line arguments
Running your code in parallel – A simulation case study
Auxiliary functions
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.