Research on MapReduce Based Incremental Iterative Model and Framework

Jie Song,Chaopeng Guo,Yichuan Zhang,Zhiliang Zhu,Ge Yu

doi:10.1080/03772063.2014.987703

Abstract

ABSTRACTIn the big data environment, MapReduce could be adopted to improve the efficiency of iterative algorithm on massive data through running the iterative algorithm on larger PC-cluster. However, it is inefficient if the entire data has to be re-iterated when new data is introduced. In this paper, the incremental iterative computing model (I2M) based on the incremental data and original iterative results is proposed. Then, the MapReduce and I2M based descendant query, PageRank, and K-means, are enumerated. Finally, incremental iterative computing framework (I2F) is implemented by extending HaLoop to support incremental iterative computing. A series of test cases are designed to evaluate I2F on functionality, performance, and cost of incremental iteration. The incremental iterative model proposed in this paper can adapt many iterative algorithms, and promotes the application and optimization of iterative algorithm in the big data environment.

Full Text