Enhanced dual Bloom filter based on SSD for efficient directory parsing in cloud storage system

Manyun Kim,Sang Won Lee,Hee Yong Youn,Kyung Hwan Oh

doi:10.1109/iccnc.2015.7069379

Abstract

In a file system used for big data analytics, hundreds of thousands of files exist. In such huge storage system, getting the metadata of a file takes long time. In this paper we propose an enhanced Bloom filter to accelerate the directory parsing process in large-scale file systems. Here a cache implemented on SSD keeps the metadata of directories and files accessed frequently or recently. When a file is requested, the system attempts to get the metadata from the SSD. If the metadata is not found, the access to the SSD becomes a waste of time. To avoid unnecessary SSD accesses, the flag-augmented Bloom filter (FABF) is proposed with which the existence of metadata of the requested file in the cache is predicted. Analytical modeling demonstrates that the false positive rate and false negative rate are reduced compared to the existing scheme. In addition, the implementation overhead of the proposed scheme is small.

Full Text