A decentralised framework for efficient storage and processing of big data using HDFS and IPFS

Franklin John,Suji Gopinath,Elizabeth Sherly

doi:10.1504/ijht.2020.112451

Abstract

Big data revolution emerged with greater opportunities as well as challenges. Some of the major challenges include capturing, storing, transferring, analysing, processing and updating these large and complex datasets. Traditional data handling techniques cannot manage this fast growing data. Apache Hadoop is one of the best technologies which can address the challenges involved in big data handling. Hadoop is a centralised, distributed data storage model. InterPlanetary file system (IPFS) is an emerging technology which can provide a decentralised distributed storage. By integrating both these technologies, we can create a better framework for the distributed storage and processing of big data. In the proposed work, we formulated a model for big data placement, replication and processing by combining the features of Hadoop and IPFS. Hadoop distributed file system and IPFS jointly handle the data placement and replication tasks and the programming framework MapReduce in Hadoop handle the data processing task. The experimental result shows that the proposed framework can achieve cost-effective storage as well as faster processing of big data.

Full Text