Abstract

Hadoop Distributed File System (HDFS) has been popularly utilized by many Big Data processing frameworks as their underlying storage engine, such as Hadoop MapReduce, HBase, Hive, and Spark. This makes the performance of HDFS a primary concern in the Big Data community. Recent studies have shown that HDFS cannot completely exploit the performance benefits of RDMA-enabled high performance interconnects like InfiniBand. To solve these performance issues, RDMA-enabled HDFS designs have been proposed in the literature that show better performance with RDMA-enabled networks. But these designs are tightly integrated with the specific versions of the Apache Hadoop distribution, and cannot be used with other Hadoop distributions easily. In this paper, we propose an efficient RDMA-based plugin for HDFS, which can be easily integrated with various Hadoop distributions and versions like Apache Hadoop 2.5 and 2.6, Hortonworks HDP, and Cloudera CDH. Performance evaluations show that our plugin ensures the expected performance of up to 3.7x improvement in TestDFSIO write, associated with the hybrid RDMA-enhanced design, to all these distributions. We also demonstrate that our RDMA-based plugin can achieve up to 4.6x improvement over Mellanox R4H (RDMA for HDFS) plugin.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call