Abstract

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Highlights

  • The ability to process and manage large volumes of data such as, search engines, databank centers and data mining systems...etc. require an infrastructure for storing and retrieving data, where the distributed file systems are essential component for storing data infrastructure [1]

  • A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer [3], The Distributed File System (DFS) is used to build a hierarchical view of multiple file servers and shares on the network

  • We describe the architecture of Hadoop Distributed File System (HDFS) and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! [10]

Read more

Summary

Introduction

The ability to process and manage large volumes of data such as, search engines, databank centers and data mining systems...etc. require an infrastructure for storing and retrieving data, where the distributed file systems are essential component for storing data infrastructure [1]. DFS provides permanent storage for sharing multiple files and build a hierarchical and unified view of these files by federating storage resources dispersed in a network. A DFS is a file system that supports the sharing of files in the form of persistent storage over a set of the network connected nodes. This paper is divided into two sections; In First Section, we will define DFS and show its properties and mechanism to store and share data; will show used approaches for big data systems, and as case study we will offer Hadoop and HDFS Architecture, and in second section will offer Event-B and modeling in Event-B by Rodin tool, where will demonstrate benefits and features for using Event-B in analysing and modeling the systems

Distributed File Systems
Traditional Approach
MapReduce Approach
Hadoop
HDFS Architecture
Use formal method Event-B to model
Abstract Model
First refinement
Results and proof statistics
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call