Performance Enhancement of Distributed System Using HDFS Federation and Sharding

Praveen M Dhulavvagol,S G Totad

doi:10.1016/j.procs.2023.01.254

Abstract

In today's world, 2.5 exabytes of data are generated and processed by the IT industry and different organizations. Processing and managing such a massive volume of Big data is challenging. Hadoop(HDFS) is widely used framework for processing Bigdata. The limitation of HDFS is it can scale only the datanodes and not namenode where the metadata is managed. There is no scaling of name nodes in the present Hadoop architecture, so managing the metadata of the exabyte scale becomes an essential and challenging leading to tightly coupled block storage, namespace scalability, and performance bottleneck issues. The proposed approach dynamic federated metadata management (DFMM) architecture manages the metadata in the name node. DFMM configures multiple name nodes using HDFS federation technique, and metadata is partitioned into numerous name nodes using sharding technique. Sharding manages the metadata using locality-preserving hashing and consistent hashing methods. Locality-preserving hashing stores metadata in the most appropriate name node using the file's parent directory, and metadata is dynamically distributed among Name nodes to maintain load balancing. DFMM also maintains replicas for the high availability and reliability of data. Results showcase the superiority of the proposed DFMM architecture combined with the sharding technique is highly scalable and manages metadata efficiently, and can store up to 1 billion files with a namespace size of 400 GB, enhancing the throughput by 21% and is highly scalable as compared to existing techniques. The comparative study analysis of existing HDFS, and proposed DFMM architecture combined with sharding enhances the performance and manages the metadata more efficiently.

Full Text