Research of Massive Small Files Reading Optimization Based on Parallel Network File System

Yang Hongzhang,Junwei Zhang,Xiangchao Zeng,Lu Xu,Huanqing Dong

doi:10.1109/hpcc-css-icess.2015.97

Abstract

With the rapid development of cloud computing and big data, there are more and more small files. How to manage those massive small files efficiently and provide low-latency service is becoming a hot topic in Parallel Network File System (pNFS). When reading massive small files in pNFS, because metadata access frequency is fairly high, and disk efficiency is rather low, massive small file access performance is far lower than large file access performance. This paper presents an optimization mechanism for reading small files, including extended read dir delegation, radically metadata pre-read technology and large IO data pre-read technology between small files. These optimizations could significantly reduce the reading access latency and make full use of the client cache. The effectiveness of this optimization is proved with intensive experiments, when reading massive small files, compared with pNFS, the performance of metadata reading is 1959% higher, sequential data reading is 2436% higher, the random data reading performance is 1675% higher, and the overall performance is 1767% higher.

Full Text