Abstract
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.
Highlights
Biomedical research in the post-genome era is intensely data-driven and increasingly more integrative as new technologies are introduced, such as next- or third-generation sequencing, mass spectrometry, and imaging to identify novel biological insights
The challenge remains to decide how to best take advantage of the flexibility of cloud computing to conduct these and other analyses. The purpose of this overview is three-fold: 1) introduce biomedical cloud computing, 2) provide a concrete methodology detailing how projects are developed on the cloud, and 3) demonstrate cloud computing costs
For the purposes of this guide, we focus on the use of Amazon Web Services (AWS) as the cloud computing platform and adopt the definition of Vaquero, who states that the cloud is ‘‘a large pool of usable and accessible virtualized resources
Summary
Biomedical research in the post-genome era is intensely data-driven and increasingly more integrative as new technologies are introduced, such as next- or third-generation sequencing, mass spectrometry, and imaging to identify novel biological insights. For the purposes of this guide, we focus on the use of AWS as the cloud computing platform and adopt the definition of Vaquero, who states that the cloud is ‘‘a large pool of usable and accessible virtualized resources (such as hardware, development platforms, and/or services). These resources can be dynamically re-configured to adjust to variable load (scale), allowing for optimum resource utilization’’ [6]. Cloud computing is a commodity service that can provide on-demand access to a computational infrastructure and avoids the fixed cost of capital investments in computing hardware, computing maintenance, and personnel
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.