Abstract

This paper reviews the research progress of big data processing platforms and algorithms based on MapReduce programming model in recent years. Firstly, 12 typical ones are introduced. MapReduce-based big data processing platform analyzes and compares their implementation principles and applicable scenarios, abstracts their commonalities, then introduces based on MapReduce big data analysis algorithms, including search algorithms, data cleaning/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, preference queries, optimization calculations method, graph algorithm, data mining algorithm, classifies these algorithms according to MapReduce implementation, analyzes the factors affecting the performance of the algorithm; finally, big data. The processing algorithm is abstracted as an external memory algorithm, and the characteristics of the external storage algorithm are sorted out. The research ideas and problems of the performance optimization method of the universal external memory algorithm are proposed. For the researcher’s reference. Specifically, it includes the disk I/O of optimizing the external memory algorithm, optimizing the locality of the external memory algorithm, and designing the incremental iterative algorithm. The existing large data processing platform and algorithm research mostly focus on platform dynamic performance optimization based on resource allocation and task scheduling, specific algorithm parallelization, specific algorithmic. This chapter provides researchers with a broad research study in MapReduce Big Data Processing: Platform, Tools, and Algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call