Abstract

Apache Spark is an increasingly popular distributed computation framework based on in-memory computations, which enables iterative or interactive applications to run faster. In Spark, memory management is the key to performance enhancement to avoid memory bloat problems. Compared with previous static memory manager, in Spark 1.6 and later versions, unified memory manager is implemented as the default memory management model, targeting to achieve optimal memory utilization by borrowing between storage and execution memory. However, the storage memory borrowed from execution memory may frequently be evicted when memory pressure arises. It is because of frequently re- computation during cache evicted and frequent garbage collection during shuffle in iterative applications. This situation will produce runtime overhead caused by garbage collection including cache eviction and cache re-computation. We propose a memory constraint strategy for unified memory manager in Spark to reduce runtime overhead caused by garbage collection by reducing the cache eviction size. We implement the strategy in Spark 1.6.1 using SparkPageRank, WordCount and GroupByTest to compare three different memory managers. Experimental results reveal that compared unified memory manager, memory constraint strategy can achieve better performance improvement with lower job runtime and garbage collection time when the dataset sizes or the iterations are increasing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call