Abstract

The rapid growth of data promotes the development of parallel computing. MapReduce, which is a simplified programming model of distributed parallel computing, is becoming more and more popular. In this paper, we design and implementation of parallel statistical algorithm based on Hadoop's MapReduce model. The algorithm, which is used to grasp the overall characteristics of massive data, involves the calculation of central tendency, dispersion and distribution tendency. By experiment, we come to the conclusion that the algorithm is suitable for dealing with large-scale data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call