Abstract

DRAM memory of modern multicores is partitioned into sets, each withits own memory controller governing multiple banks. Accesses can beserved in parallel to controllers and banks, but sharing of eitherbetween threads results in contention that increases latency, and so do accesses to remote controllers due to the non-uniform memoryaccess (NUMA) design. Above DRAM, a last-level cache (LLC), typically at level 3 (L3), is shared by all cores while L1 and L2caches tend to be core private. This NUMA design inflicts significant variations in execution time forapplications with large datasets due to different latencies incurred byremote memory node accesses or contention in LLC and at memorybanks/controllers. As a result, single program multiple data (SPMD) applications tend to experience computational imbalance atbarriers, which inflicts idle (wait) time for threads that at barriersarrive early and thus impairs effective processor utilization andultimately performance. This work contributes a novel memory allocator called Tint-Malloc that colors memory at the LLC, bank, and controller level to ensurelocality to the local memory node while reducing contention at theLLC/bank levels in software. After adding one line of code duringinitialization in each thread, existing applications automatically obtaincolored heap space through regular malloc calls. Experimental results with the SPEC and Parsec benchmarks show that bychoosing disjoint colors per thread, locality is increased, contentionis decreased, and overall SPMD execution becomes more balanced atbarriers than default memory allocation under Linux as well as priorcoloring approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call