Abstract

Domain-specific languages (DSLs) have the potential to provide an intuitive interface for specifying problems and solutions for domain experts. Based on this, code generation frameworks can produce compilable source code. However, apart from optimizing execution performance, parallelization is key for pushing the limits in problem size and an essential ingredient for exascale performance. We discuss necessary concepts for the introduction of such capabilities in code generators. In particular, those for partitioning the problem to be solved and accessing the partitioned data are elaborated. Furthermore, possible approaches to expose parallelism to users through a given DSL are discussed. Moreover, we present the implementation of these concepts in the ExaStencils framework. In its scope, a code generation framework for highly optimized and massively parallel geometric multigrid solvers is developed. It uses specifications from its multi-layered external DSL ExaSlang as input. Based on a general version for generating parallel code, we develop and implement widely applicable extensions and optimizations. Finally, a performance study of generated applications is conducted on the JuQueen supercomputer.

Highlights

  • Introduction and Related WorkHeterogeneity and variance in available hardware components are increasing, especially in the field of high-performance computing (HPC)

  • One technology that has been emerging since the last decade and that represents a possible remedy is given by domain-specific languages (DSLs) in conjunction with code generation techniques

  • In the scope of ExaStencils, that is in the scope of geometric multigrid solvers on regular grids, the different groups of leaf elements refer to specific regions of the computational grid

Read more

Summary

Introduction and Related Work

Heterogeneity and variance in available hardware components are increasing, especially in the field of high-performance computing (HPC). Apart from mapping DSL code to a compilable or executable representation, fully exploiting target machines by utilizing a wide range of possible hardware components is one of the most challenging problems It requires carefully optimizing the code, which luckily can be done almost in a fully-automated manner, as recent work demonstrates [13].

Required Concepts
Data Partition
Physical Data Partition
Logical Data Partition
Access to Data
Exposing Parallelism to Potential Users
Extensions and Optimizations
Automatic Generation of Communication Statements
MPI Data Types
Communication Pattern Optimization
Interweaving Intra- and Inter-Block Communication
Automatic Overlapping of Computation and Communication
Results
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.