
Large scale irregular applications, such as sparse linear algebra and graph analytics, exhibit fine-grained memory access patterns and operate on very large data sets. The Partitioned Global Address Space (PGAS) model simplifies the development of distributed-memory irregular applications, as all the memory in the system is viewed logically as a single shared address space. The Chapel programming language provides a PGAS programming model and offers high productivity for irregular application developers, as remote communication is performed implicitly. However, irregular applications written in Chapel often struggle to achieve high performance due to implicit fine-grained remote communication. In this work, we explore techniques to bridge the gap between high productivity and high performance for irregular applications using the Chapel programming language. We present high-level implementations of the Breadth First Search (BFS) and PageRank applications. We then describe optimized versions that utilize message aggregation and data replication in ways that could potentially be applied automatically, improving performance by as much as 1,219x for BFS and 22x for PageRank. When compared to MPI+OpenMP implementations that employ optimizations of the same type as those applied to the Chapel codes, our optimized code is 3.7x faster on average for BFS but 1.3x slower for PageRank.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call