Abstract
It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
Highlights
With multicore CPUs prevalent in even the most budget computers, clusters becoming more cost effective and realistic in smaller scenarios, and even cloud computing becoming more common, it is useful to have multiple different ways to parallelize ones code
We provide a means to make the batching more user friendly as well
We have presented the R package batch that allows a user to specifiy command line options to R script files
Summary
With multicore CPUs (central processing units) prevalent in even the most budget computers, clusters becoming more cost effective and realistic in smaller scenarios, and even cloud computing becoming more common, it is useful to have multiple different ways to parallelize ones code. Parameters can be passed in as numerical values, strings, or even vectors of values By using these command line arguments, an alternative and intuitive method of implementing parallelism into your R code is to run the same R script multiple times. Rather than running scripts in parallel by wrapping them in a function to be applied across a list (as in multicore or snowfall), one runs the script directly with different command-line values This can be more intuitive, for example, in situations dealing with large datasets where the dataset does not fit in memory, and things must be explicitly chopped into pieces anyway. The R package batch provides a means to pass parameter values into scripts and run them in parallel on a cluster or locally on any operating system. The package is available from CRAN at http://CRAN.Rproject.org/package=batch
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.