SHAstor: A Scalable HDFS-Based Storage Framework for Small-Write Efficiency in Pervasive Computing

Lingfang Zeng,Xiaopeng Fan,Song Jiang,Chengzhong Xu,Wei Shi,Yang Wang,Fan Ni

doi:10.1109/smartworld.2018.00198

Abstract

It is well known that small files are often created and accessed in pervasive computing in which information is processed with limited resources via linking with objects as encountered. And the Hadoop framework, as a de facto big data processing platform though very popular in practice, cannot effectively process the small files. In this paper, we propose a scalable HDFS-based storage framework, named SHAstor, to improve the throughput in processing of small-writes for pervasive computing paradigm. Compared to the classic HDFS, the essence of this approach is to merge the incoming small writes into a large chunk of data, either at client side or at server side, and then store it as a big file in the framework. As a consequence, this could substantially reduce the number of small files to process the pervasively gathered information. To reach this goal, the framework takes the HDFS as the basis and adds three extra modules for merging and indexing the small files during the read/write operations in pervasive applications are performed. To further facilitate this process, a new ancillary namenode is also optionally installed to store the index table. With this optimization, SHAstor can not only optimize the small-writes, but also scale out with the number of datanodes to improve the performance of pervasive applications.

Full Text