Abstract

Abstract Opportunistic computational grids use idle processor cycles from shared machines to enable the execution of long-running parallel applications. Besides computational power, these applications may also consume and generate large amounts of data, requiring an efficient data storage and management infrastructure. In this article, we present an integrated middleware infrastructure that enables the use of not only idle processor cycles, but also unused disk space of shared machines. Our middleware enables the reliable distributed storage of application data in the shared machines in a redundant and fault-tolerant way. A checkpointing-based mechanism monitors the execution of parallel applications, saves periodical checkpoints in the shared machines, and in case of node failures, supports the application migration across heterogeneous grid nodes. We evaluate the feasibility of our middleware using experiments and simulations. Our evaluation shows that the proposed middleware promotes important improvements in grid data management reliability while imposing a low performance overhead.

Highlights

  • Opportunistic computational grids [15, 16, 20, 27] use idle resources from shared commodity machines to execute applications that need large amounts of computational power

  • We present and evaluate an integrated middleware system, based on the InteGrade [20] grid system and OppStore [10] distributed storage system, that enables the usage of both idle processor cycles and unused disk space of shared machines

  • Our work encompasses several research areas, but we focus here on the literature related to the original contributions of this article, i.e., management of checkpoints of parallel applications, distributed data storage, and grid data management

Read more

Summary

Introduction

Opportunistic computational grids [15, 16, 20, 27] use idle resources from shared commodity machines to execute applications that need large amounts of computational power. The most important difference of opportunistic grids, when compared to dedicated ones, is that machines will very often fail, become inaccessible, or change from idle to occupied unexpectedly This may compromise the middleware infrastructure, the data stored in the nodes that became unavailable, and the execution of grid applications. We present and evaluate an integrated middleware system, based on the InteGrade [20] grid system and OppStore [10] distributed storage system, that enables the usage of both idle processor cycles and unused disk space of shared machines. Development of an integrated middleware infrastructure that enables the use of both idle processor cycles and unused disk space of shared machines in opportunistic grids;. Evaluation through experiments and simulations of the feasibility of using the idle disk space of grid machines for storage of checkpointing and application data

Related work
Distributed data storage and management
Storage of checkpointing data
The middleware architecture
Reliable distributed storage
CDRM organization
Data storage and retrieval
Data management
Reliable execution of parallel applications
Management of application data
Checkpointing mechanism
Checkpoint storage
Middleware evaluation
Data availability with node departures
Network usage for fragment maintenance
Storage of checkpoints from parallel applications
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call