Abstract

With large-scale data exploding so quickly that the traditional big data processing framework Hadoop has met its bottleneck on data storing layer. Running Hadoop on modern HPC clusters has attracted much attention due to its unique data processing and analyzing capabilities. Lustre file system is a promising parallel storage file system occupied HPC file system market for many years. Thus, Lustre-based Hadoop platform will pose many new opportunities and challenges on today’s data era. In this paper, we customized our LustreFileSystem class which inherits from FileSystem class (inner Hadoop source code) to build our Lustre-based Hadoop. And to make full use of the high-performance in Lustre file system, we propose a novel dynamic stripe strategy to optimize stripe size during writing data to Lustre file system. Our results indicate that, we can improve the performance obviously in throughput (mb/sec) about 3x in writing and 11x in reading, and average IO rate (mb/sec) at least 3 times at the same time when compared with initial Hadoop. Besides, our dynamic stripe strategy can smooth the reading operation and give a slight improvement on writing procedure when compared with existing Lustre-based Hadoop.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.