Abstract

Various scientific research organizations generate several petabytes of data per year through computational science simulations. These data are often shared by geographically distributed data centers for data analysis. One of the major challenges in distributed environments is failure; hardware, network, and software might fail at any instant. Thus, high-speed and fault tolerant data transfer frameworks are vital for transferring such large data efficiently between the data centers. In this study, we proposed a bloom filter-based data aware probabilistic fault tolerance (DAFT) mechanism that can handle such failures. We also proposed a data and layout aware mechanism for fault tolerance (DLFT) to effectively handle the false positive matches of DAFT. We evaluated the data transfer and recovery time overheads of the proposed fault tolerance mechanisms on the overall data transfer performance. The experimental results demonstrated that the DAFT and DLFT mechanisms exhibit a maximum of 10% and a minimum of 2% recovery time overhead at 80% and 20% fault points respectively. However, we observed minimum to negligible overhead with respect to the overall data transfer rate.

Highlights

  • Modern scientific experimental facilities such as CERN [1], LIGO [2], and ORNL [3] generate terabytes to petabytes of data every day

  • Such data transfer tools significantly improve the data transfer performance, they introduce additional complexity in the efficient management of faults because traditional file offset-based fault tolerance mechanisms are not suitable owing to the out-of-order nature of data transfer

  • We proposed data aware probabilistic fault tolerance (DAFT) mechanisms that employed a bloom filter-based probabilistic data structure for efficiently managing faults with out-of-order object transmission

Read more

Summary

Introduction

Modern scientific experimental facilities such as CERN [1], LIGO [2], and ORNL [3] generate terabytes to petabytes of data every day. Every single entity in today’s world has some digital component or counterpart, which is capable of generating data. Devices such as mobile phones, (security) cameras, smart home gadgets, and telemetry devices continuously generate data or digital content. To provide a better quality of service to customers in terms of the response time and availability based on the location, service providers distribute their data centers geographically worldwide. This results in a significant increase in the demand for data transfer among data centers in such geo-distributed data center systems. How can the available inter-data center network bandwidth be fully utilized to satisfy real-time computational requirements?

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call