Performance Improvement of Hadoop ext4-based Disk I/O

Makoto Nakagami,Saneyasu Yamaguchi,Jose A B Fortes

doi:10.1109/candar51075.2020.00032

Abstract

Hadoop is one of the most popular big-data analytics platforms, often relying on hard disk drives for storage of big-data amounts that exceed the capacity of solid-state drives. Unlike other data-intensive applications, such as database management systems, big-data processing jobs frequently require extensive sequential I/O requests. Previously proposed methods for improving sequential I/O performance modified the block usage bitmap of the Ext2/3 filesystem in order to actively use the faster disk zones, which are the outer zones in each hard disk drive. However, these methods do not support Ext4, which is the current version of Ext filesystems. In this paper, we discuss a method for improving the sequential I/O performance of the Ext4 filesystem. First, we evaluate the sequential file access throughputs on Ext3, Ext4, and XFS filesystems. We point out that Ext4 does not actively utilize the area freed by deleting existing files, causing declines in file access performance. Second, we propose a method for improving the Ext4 sequential file access performance. The improved Ext4 actively utilizes the faster zones of storage devices by controlling file placement location. Third, we evaluate the proposed filesystem and show that it outperforms existing filesystems. In the case of TeraSort, Hadoop with the proposed Ext4 filesystem performs better than when using the original Ext4 filesystem by as much as 30.1%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Improvement of Hadoop ext4-based Disk I/O

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Competition of virtualized ext4, xfs and btrfs filesystems under type-2 hypervisor
Dj Pesic ... V Timcenko
-
Dj Pesic, et. al.Dj Pesic ... V Timcenko
01 Nov 2016
01 Nov 2016

An Empirical Evaluation of NVM-Aware File Systems on Intel Optane DC Persistent Memory Modules
Guangyu Zhu ... Jaehyun Han
Electronics | VOL. 10
Guangyu Zhu, et. al.Guangyu Zhu ... Jaehyun Han
17 Aug 2021
Electronics | VOL. 10

FUSE based file system for efficient storage and retrieval of fragmented multimedia files
Wasim Ahmad Bhat
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Wasim Ahmad BhatWasim Ahmad Bhat
27 Aug 2022
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Ext4 and XFS File System Forensic Framework Based on TSK
Hyungchan Kim ... Sungbum Kim
Electronics | VOL. 10
Hyungchan Kim, et. al.Hyungchan Kim ... Sungbum Kim
20 Sep 2021
Electronics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Improvement of Hadoop ext4-based Disk I/O

Abstract

Talk to us

Similar Papers