Biomedical Cloud Computing With Amazon Web Services

Vincent A Fusaro,Dennis P Wall,Prasad Patil,Peter J Tonellato,Erik Gafni

doi:10.1371/journal.pcbi.1002147

Abstract

In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.

Highlights

Biomedical research in the post-genome era is intensely data-driven and increasingly more integrative as new technologies are introduced, such as next- or third-generation sequencing, mass spectrometry, and imaging to identify novel biological insights
The challenge remains to decide how to best take advantage of the flexibility of cloud computing to conduct these and other analyses. The purpose of this overview is three-fold: 1) introduce biomedical cloud computing, 2) provide a concrete methodology detailing how projects are developed on the cloud, and 3) demonstrate cloud computing costs
For the purposes of this guide, we focus on the use of Amazon Web Services (AWS) as the cloud computing platform and adopt the definition of Vaquero, who states that the cloud is ‘‘a large pool of usable and accessible virtualized resources

Summary

Introduction

Biomedical research in the post-genome era is intensely data-driven and increasingly more integrative as new technologies are introduced, such as next- or third-generation sequencing, mass spectrometry, and imaging to identify novel biological insights. For the purposes of this guide, we focus on the use of AWS as the cloud computing platform and adopt the definition of Vaquero, who states that the cloud is ‘‘a large pool of usable and accessible virtualized resources (such as hardware, development platforms, and/or services). These resources can be dynamically re-configured to adjust to variable load (scale), allowing for optimum resource utilization’’ [6]. Cloud computing is a commodity service that can provide on-demand access to a computational infrastructure and avoids the fixed cost of capital investments in computing hardware, computing maintenance, and personnel

Amazon Web Services

Security in the Cloud

Data Transfer Type Data IN

Access an Instance Using a Secure Connection

Prototyping and Development

Developing a Scalable Computing Environment

Summary

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Aug 25, 2011
Citations: 132	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Biomedical Cloud Computing With Amazon Web Services

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Network Stability Based on the Amount of Available Bandwidth in a Software Defined Networking
Ayotuyi Tosin Akinola ... Pragasen Mudali
-
Ayotuyi Tosin Akinola, et. al.Ayotuyi Tosin Akinola ... Pragasen Mudali
01 Sep 2019
01 Sep 2019

Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.
Aarti Desai ... Abhay Jere
PLoS ONE | VOL. 8
Aarti Desai, et. al.Aarti Desai ... Abhay Jere
12 Apr 2013
PLoS ONE | VOL. 8

An efficient algorithm for DNA fragment assembly in MapReduce
Baomin Xu ... Chunyan Li
Biochemical and Biophysical Research Communications | VOL. 426
Baomin Xu, et. al.Baomin Xu ... Chunyan Li
29 Aug 2012
Biochemical and Biophysical Research Communications | VOL. 426

Successful Introduction of Cloud Computing into your Organization: A Six-Step Conceptual Model
Hossein Bidgoli
Journal of International Technology and Information Management | VOL. 20
Hossein BidgoliHossein Bidgoli
01 Jan 2010
Journal of International Technology and Information Management | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Biomedical Cloud Computing With Amazon Web Services

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology