Abstract

Out of memory (OOM) errors are common and serious in MapReduce applications. Since MapReduce framework hides the details of distributed execution, it is challenging for users to pinpoint the OOM root causes. Current memory analyzers and memory leak detectors can only figure out what objects are (unnecessarily) persisted in memory but cannot figure out where the objects come from and why the objects become so large. Thus, they cannot identify the OOM root causes.Our empirical study on 56 OOM errors in real-world MapReduce applications found that the OOM root causes are improper job configurations, data skew, and memory-consuming user code. To identify the root causes of OOM errors in MapReduce applications, we design a memory profiling tool Mprof. Mprof can automatically profile and quantify the correlation between a MapReduce application’s runtime memory usage and its static information (input data, configurations, user code). Mprof achieves this through modeling and profiling the application’s dataflow, the memory usage of user code, and performing correlation analysis on them. Based on this correlation, Mprof uses quantitative rules to trace OOM errors back to the problematic user code, data, and configurations.We evaluated Mprof through diagnosing 28 real-world OOM errors in diverse MapReduce applications. Our evaluation shows that Mprof can accurately identify the root causes of 23 OOM errors, and partly identify the root causes of the other 5 OOM errors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call