Abstract

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows “in the cloud.” Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.

Highlights

  • Cloud computing provides on-demand, scalable access to resources that manage, process and store data through the use of remote storage services and emulated computing systems, otherwise known as virtual machines, over the Internet

  • As we describe if a job is intractable on a typical workstation, and does not reflect an ongoing increase in workload that would push utilization consistently over approximately 70%, we would consider cloud services before investing in cluster infrastructure

  • One possibility that we explored for this performance differential is the fact that two virtual central processing units (vCPUs) correspond to a single physical core, and the vCPUs are implemented using hyper-threading

Read more

Summary

Introduction

Cloud computing provides on-demand, scalable access to resources that manage, process and store data through the use of remote storage services and emulated computing systems, otherwise known as virtual machines, over the Internet. We limited our benchmarking to services provided by Amazon because of their development of cfncluster,1 an AWS framework that simplifies the creation and management of high performance computing clusters, which are similar to platforms commonly used to run large scale neuroimaging applications.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call