Dynamic Merging based Small File Storage (DM-SFS) Architecture for Efficiently Storing Small Size Files in Hadoop

Mohd Abdul Ahad,Ranjit Biswas

doi:10.1016/j.procs.2018.05.128

Abstract

In today’s computing era, the voluminous data that is generated every moment needs special tools and techniques for its effective and efficient handling and storage. In this paper, a technique for efficiently storing small size files in Hadoop distributed file system has been proposed. The proposal works by filtering the incoming files on the basis of two parameters- “file-type” (text, pdf, document, binary etc) and “file-size” (the amount of storage space required by the file). In order to secure the contents of the files we also propose to encrypt the files using Twofish cryptographic technique. This filtration and encryption is carried out before the files are passed onto the Hadoop distributed file system. For efficient storage of file, the small files are merged together into a single unit. The basic criteria for merging small size files here is the “dynamic merging techniques” with respect to the type of file instead of a generalized merging strategy. Furthermore, for efficient routing of files from source to destination and vice-versa, the concept of Software Defined Networking (SDN) has been adopted in the proposal. The empirical results shows that the proposed architecture is helpful in saving the Namenode memory overhead as well as reducing the disk seek time to a greater extent.

Full Text