Reliable management of checkpointing and application data in opportunistic grids

Raphael Y De Camargo,Fabio Kon,Fernando Castor

doi:10.1007/s13173-010-0016-0

Abstract

Abstract Opportunistic computational grids use idle processor cycles from shared machines to enable the execution of long-running parallel applications. Besides computational power, these applications may also consume and generate large amounts of data, requiring an efficient data storage and management infrastructure. In this article, we present an integrated middleware infrastructure that enables the use of not only idle processor cycles, but also unused disk space of shared machines. Our middleware enables the reliable distributed storage of application data in the shared machines in a redundant and fault-tolerant way. A checkpointing-based mechanism monitors the execution of parallel applications, saves periodical checkpoints in the shared machines, and in case of node failures, supports the application migration across heterogeneous grid nodes. We evaluate the feasibility of our middleware using experiments and simulations. Our evaluation shows that the proposed middleware promotes important improvements in grid data management reliability while imposing a low performance overhead.

Highlights

Opportunistic computational grids [15, 16, 20, 27] use idle resources from shared commodity machines to execute applications that need large amounts of computational power
We present and evaluate an integrated middleware system, based on the InteGrade [20] grid system and OppStore [10] distributed storage system, that enables the usage of both idle processor cycles and unused disk space of shared machines
Our work encompasses several research areas, but we focus here on the literature related to the original contributions of this article, i.e., management of checkpoints of parallel applications, distributed data storage, and grid data management

Summary

Introduction

Opportunistic computational grids [15, 16, 20, 27] use idle resources from shared commodity machines to execute applications that need large amounts of computational power. The most important difference of opportunistic grids, when compared to dedicated ones, is that machines will very often fail, become inaccessible, or change from idle to occupied unexpectedly This may compromise the middleware infrastructure, the data stored in the nodes that became unavailable, and the execution of grid applications. We present and evaluate an integrated middleware system, based on the InteGrade [20] grid system and OppStore [10] distributed storage system, that enables the usage of both idle processor cycles and unused disk space of shared machines. Development of an integrated middleware infrastructure that enables the use of both idle processor cycles and unused disk space of shared machines in opportunistic grids;. Evaluation through experiments and simulations of the feasibility of using the idle disk space of grid machines for storage of checkpointing and application data

Related work

Distributed data storage and management

Storage of checkpointing data

The middleware architecture

Reliable distributed storage

CDRM organization

Data storage and retrieval

Data management

Reliable execution of parallel applications

Management of application data

Checkpointing mechanism

Checkpoint storage

Middleware evaluation

Data availability with node departures

Network usage for fragment maintenance

Storage of checkpoints from parallel applications

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the Brazilian Computer Society	Publication Date: Jul 28, 2010
Citations: 43	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Reliable management of checkpointing and application data in opportunistic grids

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Brazilian Computer Society

Lead the way for us

Similar Papers

"Armazenamento distribuído de dados e checkpointing de aplicações paralelas em grades oportunistas"
Raphael Yokoingawa De Camargo
-
Raphael Yokoingawa De CamargoRaphael Yokoingawa De Camargo
01 Jan 2007
01 Jan 2007

Performance prediction and race detection in message-passing parallel applications

-

01 Jan 2009
01 Jan 2009

A study of the concurrent execution of parallel and sequential applications on a non-dedicated cluster
Andrzej M Goscinski ... Adam K.L Wong
Parallel computing | VOL. 34
Andrzej M Goscinski, et. al.Andrzej M Goscinski ... Adam K.L Wong
23 Nov 2007
Parallel computing | VOL. 34

Special Issue: Advanced Strategies in Grid Environments—Models and Techniques for Scheduling and Programming
Bruno Schulze ... José Neuman De Souza
Concurrency and computation : practice & experience | VOL. 21
Bruno Schulze, et. al.Bruno Schulze ... José Neuman De Souza
05 May 2009
Concurrency and computation : practice & experience | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reliable management of checkpointing and application data in opportunistic grids

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Brazilian Computer Society