Abstract

Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, leading to missed parallelization opportunities. In this article, we provide a better understanding of task granularity for task-parallel applications running on a single Java Virtual Machine in a shared-memory multicore. We present a new methodology to accurately and efficiently collect the granularity of each executed task, implemented in a novel profiler (available open-source) that collects carefully selected metrics from the whole system stack with low overhead, and helps developers locate performance and scalability problems. We analyze task granularity in the DaCapo, ScalaBench, and Spark Perf benchmark suites, revealing inefficiencies related to fine-grained and coarse-grained tasks in several applications. We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in several applications, achieving speedups up to a factor of 5.90×. Our results highlight the importance of analyzing and optimizing task granularity on the Java Virtual Machine.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.