Hybrid cloud and cluster computing paradigms for life science applications

Judy Qiu,Bingjing Zhang,Jong Youl Choi,Yang Ruan,Hui Li,Seung-Hee Bae,Jaliya Ekanayake,Saliya Ekanayake,Geoffrey Fox,Thilina Gunarathne,Adam Hughes,Tak-Lon Wu

doi:10.1186/1471-2105-11-s12-s3

Judy Qiu, Bingjing Zhang + Show 10 more

Open Access

https://doi.org/10.1186/1471-2105-11-s12-s3

Copy DOI

Abstract

BackgroundClouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.ResultsComparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.ConclusionsThe hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.MethodsWe used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

Highlights

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications
Cloud computing [1] is at the peak of the Gartner technology hype curve [2], but there are good reasons to believe that it is for real and will be important for large scale scientific computing: 1) Clouds are the largest scale computer centers constructed, and so they have the capacity to be important to large-scale science problems as well as those at small scale
We focus on the MapReduce programming model [18], which can be implemented on any cluster using the open source Hadoop [19] software for Linux or the Microsoft Dryad system [20,21] for Windows

Summary

Introduction

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications They have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. 4) There are 3 major vendors of clouds (Amazon, Google, and Microsoft) and many other infrastructure and software cloud technology vendors including Eucalyptus Systems, which spun off from UC Santa Barbara HPC research. Much scientific computing can be performed on clouds [11], but there are some well-documented problems with using clouds, including: 1) The centralized computing model for clouds runs counter to the principle of “bringing the computing to the data”, and bringing the “data to a commercial cloud facility” may be slow and expensive

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2010
Citations: 86	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Hybrid cloud and cluster computing paradigms for life science applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

HCA Operator: A Hybrid Cloud Auto-scaling Tooling for Microservice Workloads
Yuyang Wang ... Fan Zhang
-
Yuyang Wang, et. al.Yuyang Wang ... Fan Zhang
01 Dec 2022
01 Dec 2022

Health Information Exchange for Home-Based Chronic Disease Self-Management -- A Hybrid Cloud Approach
Jianping Ma ... Qiang Chen
-
Jianping Ma, et. al.Jianping Ma ... Qiang Chen
01 Nov 2014
01 Nov 2014

Dynamic Service Provisioning and Selection for Satisfying Cloud Applications and Cloud Providers in Hybrid Cloud
Xu Lijun ... Li Chunlin
International Journal of Cooperative Information Systems | VOL. 26
Xu Lijun, et. al.Xu Lijun ... Li Chunlin
14 Nov 2017
International Journal of Cooperative Information Systems | VOL. 26

Agents collaboration‐based service provisioning strategy for large enterprise business in hybrid cloud
Li Chunlin ... Yan Xin
Transactions on Emerging Telecommunications Technologies | VOL. 28
Li Chunlin, et. al.Li Chunlin ... Yan Xin
23 Jul 2015
Transactions on Emerging Telecommunications Technologies | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid cloud and cluster computing paradigms for life science applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics