Abstract

The data management has become a challenging issue for network centric applications which need to process large amount of datasets. System requires advanced tools to analyse these datasets. As an efficient parallel computing programming model Map Reduce and Hadoop are used for large-scale data analysis. However, Map Reduce still suffers with performance problems Map Reduce uses a shuffle phase individual shuffle service component with efficient I/O policy. The map phase requires an improvement in its performance as this phase's output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate check points which regularly monitor all the splits generated by intermediate phases. This acts as a barrier for effective resource utilisation. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling and increase resource utilisation in a cluster.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call