Abstract

Simulation studies are helpful in testing novel statistical methods. From a computational perspective, they constitute embarrassingly parallel tasks. We describe parallelization techniques in the programming language R that can be used on Amazon's cloud-based infrastructure. After a short conceptual overview of the parallelization techniques in R, we provide a hands-on tutorial on how the doRedis package in conjunction with the Redis server can be used on Amazon Web Services, specifically running spot fleets. The tutorial proceeds in seven steps, ie, (1) starting up an EC2 instance, (2) installing a Redis server, (3) using doRedis with a local worker, (4) using doRedis with a remote worker, (5) setting up instances that automatically fetch tasks from a specific master, (6) using spot-fleets, and (7) shutting down the instances. As a basic example, we show how these techniques can be used to assess the effects of heteroscedasticity on the equal-variance t-test. Furthermore, we address several advanced issues, such as multiple conditions, cost-management, and chunking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call