Improving Small File I/O Performance for Massive Digital Archives

Hwajung Kim,Heonyoung Yeom

doi:10.1109/escience.2017.39

Abstract

With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.

Full Text